Senior System Software Engineer - Dynamo-Triton Inference Server

at Nvidia

📍 Santa Clara, United States

USD 152,000-287,500 per year

SENIOR

✅ On-site

Used Tools & Technologies

Machine Learning

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Python @ 3 GitHub @ 4 Distributed Systems @ 4 Hiring @ 4 Communication @ 7 Networking @ 4 Rust @ 3 Debugging @ 7 OSS @ 4 LLM @ 4 PyTorch @ 4 Agile @ 7 GPU @ 4 Deep Learning @ 4 AI @ 4 vLLM @ 4 TensorRT @ 4 Performance Analysis @ 7

Details

We are hiring a Senior System Software Engineer to work on the Dynamo-Triton Inference Server. The team builds a GPU-accelerated deep learning inference platform to make design and deployment of AI models easier and accessible to users across academic and commercial domains.

Responsibilities

Develop world-class GPU-accelerated AI inference serving software.
Contribute to feature development and drive broad customer adoption.
Drive the convergence of the Triton Inference Server and NVIDIA Dynamo stacks to establish a unified, high-performance inference platform serving both Large Language Model (LLM) and non-LLM workloads.
Be an active member of the open source deep learning software engineering community.
Build robust software designed to be deployed in production server or cloud environments, optimize and balance prediction throughput and latency, and develop/adopt next-generation inference technologies.

Requirements

MS or PhD in Computer Science or a relevant field (or equivalent experience).
5+ years of professional experience working on deep learning software.
Excellent Rust and C++ skills; familiarity with Python.
Strong programming and software design skills, including debugging, performance analysis, and test design.
Experience with high-scale distributed systems and ML systems.
Strong communication skills and ability to work in a fast-paced, agile team environment.

Ways to stand out

Prior experience with AI frameworks and engines such as TensorRT, PyTorch, ONNX, OpenVINO, vLLM, or TRT-LLM.
Knowledge of GPU memory management, cache management, or high-performance networking.
Experience with distributed systems programming.
Experience contributing to large open source projects (use of GitHub, bug tracking, branching/merging code, OSS licensing and patch handling).

Compensation and other details

Base salary ranges provided by level:
- Level 3: 152,000 USD - 241,500 USD
- Level 4: 184,000 USD - 287,500 USD
Eligible for equity and benefits.
Location: Santa Clara, California, United States.
Applications accepted at least until February 22, 2026.
This posting is for an existing vacancy. NVIDIA uses AI tools in recruiting processes and is an equal opportunity employer.