Senior Software Architect, Always-On Profiling

at Nvidia

📍 Santa Clara, United States

USD 184,000-356,500 per year

SENIOR

✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Security @ 7 Software Development @ 7 Python @ 7 Machine Learning @ 4 Leadership @ 7 Communication @ 4 Debugging @ 4 API @ 4 Technical Leadership @ 7 Design Patterns @ 4 PyTorch @ 3 CUDA @ 6 GPU @ 4

Details

Innovate GPU performance analysis for Machine Learning workloads by designing, implementing, and leading the Always-On Profiling (AON) service. This role requires deep technical expertise, a proven track record in solving ambiguous challenges, and strong technical leadership.

Responsibilities

Architect and build scalable systems for the AON profiling service, mastering inter-process communication, memory management, and low-overhead architectures for multi-node, multi-process, multi-GPU, and cluster environments.
Promote software engineering excellence with a focus on design patterns, concurrency, parallelism, and advanced debugging for asynchronous systems, ensuring code quality and robust testing.
Lead, mentor engineers, perform code reviews, and shape technical roadmaps while identifying and solving complex technical issues.
Drive full-stack development including planning, prototyping, implementation, testing, and customer evaluation, covering user applications, drivers, performance counter libraries, and hardware abstraction layers.
Collaborate across internal and external teams with effective communication to integrate AON into the broader profiling and ML ecosystem.

Requirements

BS or MS degree or equivalent experience in Computer Engineering, Computer Science, or related field.
8+ years software development experience in C, C++, and Python.
10+ years in system software design, operating systems fundamentals, computer architectures, performance analysis, and production-quality software delivery.
Strong communication skills for cross-organizational partnerships and technical leadership.
Expert knowledge of profiling technologies (sampling, tracing), overhead analysis, and diverse profiling data (CPU/GPU events, counters, API traces).
Proficiency with CUDA APIs, runtime, streams, kernels, and GPU architecture.
Familiarity with ML frameworks such as PyTorch, JAX, and performance analysis for AI training/inference.
Experience with large-scale system development and debugging including user mode and kernel drivers.
Skilled in designing APIs and interfaces for profiling tools enabling integration with various frameworks.
Proven ability in simplifying complex problems and leading solutions.

Nice to Have

Experience designing low-overhead profiling systems for complex distributed environments.
Deep understanding of PyTorch internals and CUDA, including tensor memory and distributed training.
Competence in GPU performance analysis and translating profiling data into actionable insights.
Skilled in translating customer needs into actionable use cases.
Strong understanding of system security principles.

Benefits

Base salary range: 184,000 USD - 356,500 USD, dependent on location, experience, and comparable employee pay.
Eligibility for equity and benefits.
Ongoing application acceptance.
Commitment to diversity and equal opportunity employment without discrimination.