Manager, System Software - Triton Inference Server

at Nvidia

📍 Santa Clara, United States

USD 224,000-356,500 per year

MIDDLE

✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Software Development @ 3 Python @ 5 GitHub @ 3 Machine Learning @ 3 TensorFlow @ 3 Hiring @ 3 gRPC @ 6 Mentoring @ 3 Protobuf @ 6 Prioritization @ 3 Performance Optimization @ 5 Jira @ 6 Debugging @ 5 API @ 6 LLM @ 3 PyTorch @ 3 Agile @ 3 CUDA @ 3 GPU @ 3

Details

NVIDIA is searching for a passionate Software Engineering Manager to lead the Triton Inference Server team. Triton is cutting-edge, open source inference software that powers AI deployment across cloud, data center, edge, and embedded devices — supporting models from TensorRT, TensorFlow, PyTorch, ONNX, and more. Join us to shape the future of scalable, production-ready AI solutions used by innovators around the globe.

Responsibilities

Guide, mentor, and develop an inclusive and collaborative engineering team focused on delivering robust model serving solutions.
Drive planning, prioritization, and execution for projects that improve Triton’s scalability, performance, and reliability in non-generative AI deployments.
Foster partnerships with Product and Program Management to create feature roadmaps, manage cross-team dependencies, and balance project resources for both cloud and on-premises platforms.
Collaborate with internal collaborators and external customers to understand use cases and convert their needs into product features.
Promote engineering excellence through modern, agile development practices and a culture of quality and accountability.

Requirements

Master’s or PhD, or equivalent experience, in Computer Science, Computer Engineering, or a related field.
Eight or more years of overall hands-on software development experience in customer-facing environments.
At least three years building, mentoring, and leading software engineering teams delivering production-grade solutions.
Deep background in scalable serving architectures, with direct experience building cloud-native inference APIs, REST/gRPC/protobuf-based services, or similar technologies.
Advanced C++ and Python development skills, demonstrating clean, object-oriented design, as well as proficiency in debugging, performance optimization, and testing.
Track record of contributing to or leading large open-source projects — using GitHub for code reviews, bug tracking, and release management.
Strong knowledge of agile methodologies and tools such as JIRA and Linear.
Ability to communicate technical topics with clarity and empathy to colleagues, partners, and diverse audiences.

Ways to stand out

Experience working within distributed, global teams.
Practical knowledge of machine learning model deployment with frameworks such as TensorRT, TRT-LLM, PyTorch, ONNX, Python, or similar platforms.
Understanding of CPU and GPU architectures.
Skills in GPU programming (for example, CUDA or OpenCL).

Compensation & Benefits

Base salary range: 224,000 USD - 356,500 USD (determined based on location, experience, and comparable employees).
Eligible for equity and company benefits (see NVIDIA benefits page).

Additional information

Applications for this job will be accepted at least until September 23, 2025.
NVIDIA is committed to fostering a diverse work environment and is an equal opportunity employer. NVIDIA does not discriminate on the basis of protected characteristics in hiring and promotion practices.