Senior Software Architect, Always-On Profiling
at Nvidia
π Santa Clara, United States
USD 184,000-356,500 per year
SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
Security @ 7 Software Development @ 7 Python @ 7 Machine Learning @ 4 Leadership @ 7 Communication @ 4 Debugging @ 4 API @ 4 Technical Leadership @ 7 Design Patterns @ 4 PyTorch @ 3 CUDA @ 6 GPU @ 4Details
Innovate GPU performance analysis for Machine Learning workloads by designing, implementing, and leading the Always-On Profiling (AON) service. This role requires deep technical expertise, a proven track record in solving ambiguous challenges, and strong technical leadership.
Responsibilities
- Architect and build scalable systems for the AON profiling service, mastering inter-process communication, memory management, and low-overhead architectures for multi-node, multi-process, multi-GPU, and cluster environments.
- Promote software engineering excellence with a focus on design patterns, concurrency, parallelism, and advanced debugging for asynchronous systems, ensuring code quality and robust testing.
- Lead, mentor engineers, perform code reviews, and shape technical roadmaps while identifying and solving complex technical issues.
- Drive full-stack development including planning, prototyping, implementation, testing, and customer evaluation, covering user applications, drivers, performance counter libraries, and hardware abstraction layers.
- Collaborate across internal and external teams with effective communication to integrate AON into the broader profiling and ML ecosystem.
Requirements
- BS or MS degree or equivalent experience in Computer Engineering, Computer Science, or related field.
- 8+ years software development experience in C, C++, and Python.
- 10+ years in system software design, operating systems fundamentals, computer architectures, performance analysis, and production-quality software delivery.
- Strong communication skills for cross-organizational partnerships and technical leadership.
- Expert knowledge of profiling technologies (sampling, tracing), overhead analysis, and diverse profiling data (CPU/GPU events, counters, API traces).
- Proficiency with CUDA APIs, runtime, streams, kernels, and GPU architecture.
- Familiarity with ML frameworks such as PyTorch, JAX, and performance analysis for AI training/inference.
- Experience with large-scale system development and debugging including user mode and kernel drivers.
- Skilled in designing APIs and interfaces for profiling tools enabling integration with various frameworks.
- Proven ability in simplifying complex problems and leading solutions.
Nice to Have
- Experience designing low-overhead profiling systems for complex distributed environments.
- Deep understanding of PyTorch internals and CUDA, including tensor memory and distributed training.
- Competence in GPU performance analysis and translating profiling data into actionable insights.
- Skilled in translating customer needs into actionable use cases.
- Strong understanding of system security principles.
Benefits
- Base salary range: 184,000 USD - 356,500 USD, dependent on location, experience, and comparable employee pay.
- Eligibility for equity and benefits.
- Ongoing application acceptance.
- Commitment to diversity and equal opportunity employment without discrimination.