Used Tools & Technologies
Not specified
Required Skills & Competences ?
Security @ 4 Leadership @ 4 Communication @ 7 Debugging @ 4 API @ 7 Technical Leadership @ 4 PyTorch @ 4 CUDA @ 4 GPU @ 4Details
Design and ship an Always-On, low-overhead GPU profiling service that runs in production, scales across cluster environments, and delivers actionable insights for ML workloads. You will lead the architecture and hands-on delivery across system software, drivers, and CUDA to make profiling continuously available and reliable.
Responsibilities
- Own the architecture for an Always-On profiling service, defining interfaces, data flows, and scalability guarantees for multi-process/GPU/node systems.
- Drive low-overhead, high-reliability implementations in C/C++, including IPC/shared memory, lock-free buffers, and bounded CPU/memory budgets with clear benchmarks.
- Lead end-to-end feature delivery spanning user-mode components, driver/platform layers, and performance counter/trace providers.
- Establish profiling models that integrate with existing ML/AI workflows (e.g., PyTorch/XLA) to turn low-level signals into actionable insights.
- Set technical direction for an engineering team; mentor engineers, drive technical planning to mitigate architectural risks, and align roadmaps across internal and external partners.
Requirements
- BS or MS degree or equivalent experience in Computer Engineering, Computer Science, or related field.
- 15+ years of system-level C/C++ development, including concurrency, memory management, and performance engineering.
- Expertise with profiling/tracing stacks for CPU/GPU (e.g., CUPTI, Nsight, performance counters, event correlation) and debugging concurrent systems.
- Deep hands-on CUDA and GPU architecture knowledge (runtime/driver APIs, CUDA streams/graphs, kernel behavior).
- Proven experience designing and shipping production-quality system software or drivers with strict reliability, observability, and performance constraints.
- Demonstrated technical leadership: defining architecture and success metrics, and translating abstract product visions into actionable technical roadmaps with fast-paced, multidisciplinary teams.
- Strong interpersonal, verbal, and written communication; able to influence across organizations and build trust with external collaborators.
Ways to stand out
- Track record building continuous/always-on or multi-client profiling systems with predictable overhead at scale.
- Hands-on experience tuning ML training/inference loops based on deep profiling analysis.
- Familiarity with ML ecosystems (e.g., PyTorch, JAX) and correlating application-level events with GPU traces/metrics.
- Strong background in translating profiling data into actionable performance insights (compute vs memory bound, bottleneck triage).
- Experience with user-mode driver development and integration with platform permissions/security models.
Benefits & Additional Information
- Base salary range: 272,000 USD - 425,500 USD (determined based on location, experience, and pay of employees in similar positions).
- Eligible for equity and benefits.
- Applications accepted at least until December 20, 2025.
- NVIDIA is an equal opportunity employer committed to fostering a diverse work environment.