Senior Software Engineer, Profiling Services
at Nvidia
π Santa Clara, United States
USD 184,000-356,500 per year
SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
Security @ 4 Software Development @ 7 Machine Learning @ 4 Leadership @ 7 Communication @ 4 Debugging @ 4 API @ 3 Technical Leadership @ 7 Design Patterns @ 4 PyTorch @ 3 CUDA @ 4 GPU @ 4Details
Join the Developer Tools Always-On Profiling (AON) team to design, implement, and lead an Always-On Profiling service focused on GPU performance analysis for Machine Learning workloads. This role requires deep technical expertise in system and performance engineering, strong technical leadership, and proven experience delivering production-quality system software across complex multi-node, multi-process, multi-GPU environments.
Responsibilities
- Architect and build scalable systems for the AON profiling service, including inter-process communication (IPC), memory management, and low-overhead architectures for profiling data from multi-node, multi-process, multi-GPU and cluster environments.
- Promote engineering excellence: apply and evangelize design patterns, concurrency and parallelism best practices, advanced debugging techniques for asynchronous systems, and robust testing.
- Lead and mentor engineers: provide impactful code reviews, shape technical roadmaps, and guide teams through ambiguous/complex technical problems.
- Drive full-stack development: translate user needs into requirements and design documents; lead end-to-end feature development from planning and prototyping to implementation, testing, and customer evaluation across user applications, drivers, performance counter libraries, and platform/hardware abstraction layers.
- Collaborate cross-functionally with internal and external teams to integrate AON into the broader profiling and ML ecosystem.
Requirements
- BS or MS degree (or equivalent experience) in Computer Engineering, Computer Science, or related field.
- 8+ years of meaningful software development experience in C and C++.
- 10+ years of system software design experience, with strong foundations in operating systems, computer architectures, performance analysis, and delivering production-quality software.
- Strong interpersonal, verbal, and written communication skills; demonstrated ability to build cross-organizational partnerships and lead technical teams.
- Deep expertise with profiling and performance tools and methodologies (sampling, tracing, overhead analysis), and familiarity with profiling data types (CPU/GPU events, performance counters, API traces, event correlation).
- In-depth CUDA and GPU knowledge: CUDA APIs, runtime, streams, kernels, and GPU architecture.
- Familiarity with ML frameworks (PyTorch, JAX) and knowledge of performance analysis for AI training/inference workloads.
- Experience developing and debugging across large, multi-layered software systems including user-mode and kernel drivers; ability to contribute to and extend substantial codebases.
- Proficiency designing robust and flexible APIs/interfaces for profiling tools and integrations.
- Demonstrated ability to simplify ill-defined problems, design effective solutions, and lead teams to implement them.
Preferred / Ways to Stand Out
- Track record of pioneering low-overhead profiling systems in complex multi-process and distributed environments.
- Deep understanding of PyTorch internals and CUDA usage, including tensor memory handling and distributed training behaviors.
- Strong ability to translate profiling data into actionable performance optimizations for CUDA and ML frameworks.
- Experience translating customer needs into actionable use cases and requirements.
- Solid understanding of system security principles.
Compensation & Benefits
- Base salary range (Level 4): 184,000 USD β 287,500 USD.
- Base salary range (Level 5): 224,000 USD β 356,500 USD.
- Eligible for equity and benefits.
Other Information
- Employment type: Full time.
- Applications accepted at least until August 4, 2025.
- NVIDIA is an equal opportunity employer and values diversity in its workforce.