Principal Software Engineer, Profiling Services

at Nvidia
USD 272,000-425,500 per year
SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Security @ 4 Leadership @ 4 Communication @ 7 Debugging @ 4 API @ 7 Technical Leadership @ 4 PyTorch @ 4 CUDA @ 4 GPU @ 4

Details

Design and ship an Always-On, low-overhead GPU profiling service that runs in production, scales across cluster environments, and delivers actionable insights for ML workloads. You will lead the architecture and hands-on delivery across system software, drivers, and CUDA to make profiling continuously available and reliable.

Responsibilities

  • Own the architecture for an Always-On profiling service, defining interfaces, data flows, and scalability guarantees for multi-process/GPU/node systems.
  • Drive low-overhead, high-reliability implementations in C/C++, including IPC/shared memory, lock-free buffers, and bounded CPU/memory budgets with clear benchmarks.
  • Lead end-to-end feature delivery spanning user-mode components, driver/platform layers, and performance counter/trace providers.
  • Establish profiling models that integrate with existing ML/AI workflows (e.g., PyTorch/XLA) to turn low-level signals into actionable insights.
  • Set technical direction for an engineering team; mentor engineers, drive technical planning to mitigate architectural risks, and align roadmaps across internal and external partners.

Requirements

  • BS or MS degree or equivalent experience in Computer Engineering, Computer Science, or related field.
  • 15+ years of system-level C/C++ development, including concurrency, memory management, and performance engineering.
  • Expertise with profiling/tracing stacks for CPU/GPU (e.g., CUPTI, Nsight, performance counters, event correlation) and debugging concurrent systems.
  • Deep hands-on CUDA and GPU architecture knowledge (runtime/driver APIs, CUDA streams/graphs, kernel behavior).
  • Proven experience designing and shipping production-quality system software or drivers with strict reliability, observability, and performance constraints.
  • Demonstrated technical leadership: defining architecture and success metrics, and translating abstract product visions into actionable technical roadmaps with fast-paced, multidisciplinary teams.
  • Strong interpersonal, verbal, and written communication; able to influence across organizations and build trust with external collaborators.

Ways to stand out

  • Track record building continuous/always-on or multi-client profiling systems with predictable overhead at scale.
  • Hands-on experience tuning ML training/inference loops based on deep profiling analysis.
  • Familiarity with ML ecosystems (e.g., PyTorch, JAX) and correlating application-level events with GPU traces/metrics.
  • Strong background in translating profiling data into actionable performance insights (compute vs memory bound, bottleneck triage).
  • Experience with user-mode driver development and integration with platform permissions/security models.

Benefits & Additional Information

  • Base salary range: 272,000 USD - 425,500 USD (determined based on location, experience, and pay of employees in similar positions).
  • Eligible for equity and benefits.
  • Applications accepted at least until December 20, 2025.
  • NVIDIA is an equal opportunity employer committed to fostering a diverse work environment.