Senior Software Engineer, Profiling Services

at Nvidia
USD 184,000-356,500 per year
SENIOR
βœ… On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Security @ 4 Software Development @ 7 Machine Learning @ 4 Leadership @ 7 Communication @ 4 Debugging @ 4 API @ 3 Technical Leadership @ 7 Design Patterns @ 4 PyTorch @ 3 CUDA @ 4 GPU @ 4

Details

Join the Developer Tools Always-On Profiling (AON) team to design, implement, and lead an Always-On Profiling service focused on GPU performance analysis for Machine Learning workloads. This role requires deep technical expertise in system and performance engineering, strong technical leadership, and proven experience delivering production-quality system software across complex multi-node, multi-process, multi-GPU environments.

Responsibilities

  • Architect and build scalable systems for the AON profiling service, including inter-process communication (IPC), memory management, and low-overhead architectures for profiling data from multi-node, multi-process, multi-GPU and cluster environments.
  • Promote engineering excellence: apply and evangelize design patterns, concurrency and parallelism best practices, advanced debugging techniques for asynchronous systems, and robust testing.
  • Lead and mentor engineers: provide impactful code reviews, shape technical roadmaps, and guide teams through ambiguous/complex technical problems.
  • Drive full-stack development: translate user needs into requirements and design documents; lead end-to-end feature development from planning and prototyping to implementation, testing, and customer evaluation across user applications, drivers, performance counter libraries, and platform/hardware abstraction layers.
  • Collaborate cross-functionally with internal and external teams to integrate AON into the broader profiling and ML ecosystem.

Requirements

  • BS or MS degree (or equivalent experience) in Computer Engineering, Computer Science, or related field.
  • 8+ years of meaningful software development experience in C and C++.
  • 10+ years of system software design experience, with strong foundations in operating systems, computer architectures, performance analysis, and delivering production-quality software.
  • Strong interpersonal, verbal, and written communication skills; demonstrated ability to build cross-organizational partnerships and lead technical teams.
  • Deep expertise with profiling and performance tools and methodologies (sampling, tracing, overhead analysis), and familiarity with profiling data types (CPU/GPU events, performance counters, API traces, event correlation).
  • In-depth CUDA and GPU knowledge: CUDA APIs, runtime, streams, kernels, and GPU architecture.
  • Familiarity with ML frameworks (PyTorch, JAX) and knowledge of performance analysis for AI training/inference workloads.
  • Experience developing and debugging across large, multi-layered software systems including user-mode and kernel drivers; ability to contribute to and extend substantial codebases.
  • Proficiency designing robust and flexible APIs/interfaces for profiling tools and integrations.
  • Demonstrated ability to simplify ill-defined problems, design effective solutions, and lead teams to implement them.

Preferred / Ways to Stand Out

  • Track record of pioneering low-overhead profiling systems in complex multi-process and distributed environments.
  • Deep understanding of PyTorch internals and CUDA usage, including tensor memory handling and distributed training behaviors.
  • Strong ability to translate profiling data into actionable performance optimizations for CUDA and ML frameworks.
  • Experience translating customer needs into actionable use cases and requirements.
  • Solid understanding of system security principles.

Compensation & Benefits

  • Base salary range (Level 4): 184,000 USD – 287,500 USD.
  • Base salary range (Level 5): 224,000 USD – 356,500 USD.
  • Eligible for equity and benefits.

Other Information

  • Employment type: Full time.
  • Applications accepted at least until August 4, 2025.
  • NVIDIA is an equal opportunity employer and values diversity in its workforce.