Senior Software Architect, Always-On Profiling

at Nvidia
USD 184,000-356,500 per year
SENIOR
βœ… On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Security @ 7 Software Development @ 7 Python @ 7 Machine Learning @ 4 Leadership @ 7 Communication @ 4 Debugging @ 4 API @ 4 Technical Leadership @ 7 Design Patterns @ 4 PyTorch @ 3 CUDA @ 6 GPU @ 4

Details

Innovate GPU performance analysis for Machine Learning workloads by designing, implementing, and leading the Always-On Profiling (AON) service. This role requires deep technical expertise, a proven track record in solving ambiguous challenges, and strong technical leadership.

Responsibilities

  • Architect and build scalable systems for the AON profiling service, mastering inter-process communication, memory management, and low-overhead architectures for multi-node, multi-process, multi-GPU, and cluster environments.
  • Promote software engineering excellence with a focus on design patterns, concurrency, parallelism, and advanced debugging for asynchronous systems, ensuring code quality and robust testing.
  • Lead, mentor engineers, perform code reviews, and shape technical roadmaps while identifying and solving complex technical issues.
  • Drive full-stack development including planning, prototyping, implementation, testing, and customer evaluation, covering user applications, drivers, performance counter libraries, and hardware abstraction layers.
  • Collaborate across internal and external teams with effective communication to integrate AON into the broader profiling and ML ecosystem.

Requirements

  • BS or MS degree or equivalent experience in Computer Engineering, Computer Science, or related field.
  • 8+ years software development experience in C, C++, and Python.
  • 10+ years in system software design, operating systems fundamentals, computer architectures, performance analysis, and production-quality software delivery.
  • Strong communication skills for cross-organizational partnerships and technical leadership.
  • Expert knowledge of profiling technologies (sampling, tracing), overhead analysis, and diverse profiling data (CPU/GPU events, counters, API traces).
  • Proficiency with CUDA APIs, runtime, streams, kernels, and GPU architecture.
  • Familiarity with ML frameworks such as PyTorch, JAX, and performance analysis for AI training/inference.
  • Experience with large-scale system development and debugging including user mode and kernel drivers.
  • Skilled in designing APIs and interfaces for profiling tools enabling integration with various frameworks.
  • Proven ability in simplifying complex problems and leading solutions.

Nice to Have

  • Experience designing low-overhead profiling systems for complex distributed environments.
  • Deep understanding of PyTorch internals and CUDA, including tensor memory and distributed training.
  • Competence in GPU performance analysis and translating profiling data into actionable insights.
  • Skilled in translating customer needs into actionable use cases.
  • Strong understanding of system security principles.

Benefits

  • Base salary range: 184,000 USD - 356,500 USD, dependent on location, experience, and comparable employee pay.
  • Eligibility for equity and benefits.
  • Ongoing application acceptance.
  • Commitment to diversity and equal opportunity employment without discrimination.