Manager, System Software - Triton Inference Server

at Nvidia
USD 224,000-356,500 per year
MIDDLE
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Software Development @ 3 Python @ 5 GitHub @ 3 Machine Learning @ 3 TensorFlow @ 3 gRPC @ 6 Mentoring @ 3 Protobuf @ 6 Prioritization @ 3 Performance Optimization @ 5 Jira @ 6 Debugging @ 5 API @ 6 LLM @ 3 PyTorch @ 3 Agile @ 3 CUDA @ 3 GPU @ 3

Details

NVIDIA is searching for a passionate Software Engineering Manager to lead the Triton Inference Server team. Triton is cutting-edge, open source inference software that powers AI deployment across cloud, data center, edge, and embedded devices — supporting models from TensorRT, TensorFlow, PyTorch, ONNX, and more. Join us to shape the future of scalable, production-ready AI solutions used by innovators around the globe.

Responsibilities

  • Guide, mentor, and develop an inclusive and collaborative engineering team focused on delivering robust model serving solutions.
  • Drive planning, prioritization, and execution for projects that improve Triton’s scalability, performance, and reliability in non-generative AI deployments.
  • Foster partnerships with Product and Program Management to create feature roadmaps, manage cross-team dependencies, and balance project resources for both cloud and on-premises platforms.
  • Collaborate with internal collaborators and external customers to understand use cases and convert their needs into product features.
  • Promote engineering excellence through modern, agile development practices and a culture of quality and accountability.

Requirements

  • Master’s or PhD, or equivalent experience, in Computer Science, Computer Engineering, or a related field.
  • Eight or more years of overall hands-on software development experience in customer-facing environments.
  • At least three years building, mentoring, and leading software engineering teams delivering production-grade solutions.
  • Deep background in scalable serving architectures, with direct experience building cloud-native inference APIs, REST/gRPC/protobuf-based services, or similar technologies.
  • Advanced C++ and Python development skills, demonstrating clean, object-oriented design, as well as proficiency in debugging, performance optimization, and testing.
  • Track record of contributing to or leading large open-source projects — using GitHub for code reviews, bug tracking, and release management.
  • Strong knowledge of agile methodologies and tools such as JIRA and Linear.
  • Ability to communicate technical topics with clarity and empathy to colleagues, partners, and diverse audiences.

Ways to stand out

  • Experience working within distributed, global teams.
  • Practical knowledge of machine learning model deployment with frameworks such as TensorRT, TRT-LLM, PyTorch, ONNX, Python, or similar platforms.
  • Understanding of CPU and GPU architectures.
  • Skills in GPU programming (for example, CUDA or OpenCL).

Compensation & Benefits

  • Base salary range: 224,000 USD - 356,500 USD (determined based on location, experience, and internal pay practices).
  • Eligible for equity and company benefits (link provided in original posting).

Additional Information

  • Applications accepted at least until September 23, 2025.
  • Location specified: Santa Clara, California, United States.
  • NVIDIA is an equal opportunity employer committed to diversity and inclusion.