Senior System Software Engineer - Dynamo-Triton Inference Server
at Nvidia
USD 152,000-287,500 per year
Used Tools & Technologies
Machine LearningRequired Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Python @ 3
GitHub @ 4
Distributed Systems @ 4
Hiring @ 4
Communication @ 7
Networking @ 4
Rust @ 3
Debugging @ 7
OSS @ 4
LLM @ 4
PyTorch @ 4
Agile @ 7
GPU @ 4
Deep Learning @ 4
AI @ 4
vLLM @ 4
TensorRT @ 4
Performance Analysis @ 7
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
We are hiring a Senior System Software Engineer to work on the Dynamo-Triton Inference Server. The team builds a GPU-accelerated deep learning inference platform to make design and deployment of AI models easier and accessible to users across academic and commercial domains.
Responsibilities
- Develop world-class GPU-accelerated AI inference serving software.
- Contribute to feature development and drive broad customer adoption.
- Drive the convergence of the Triton Inference Server and NVIDIA Dynamo stacks to establish a unified, high-performance inference platform serving both Large Language Model (LLM) and non-LLM workloads.
- Be an active member of the open source deep learning software engineering community.
- Build robust software designed to be deployed in production server or cloud environments, optimize and balance prediction throughput and latency, and develop/adopt next-generation inference technologies.
Requirements
- MS or PhD in Computer Science or a relevant field (or equivalent experience).
- 5+ years of professional experience working on deep learning software.
- Excellent Rust and C++ skills; familiarity with Python.
- Strong programming and software design skills, including debugging, performance analysis, and test design.
- Experience with high-scale distributed systems and ML systems.
- Strong communication skills and ability to work in a fast-paced, agile team environment.
Ways to stand out
- Prior experience with AI frameworks and engines such as TensorRT, PyTorch, ONNX, OpenVINO, vLLM, or TRT-LLM.
- Knowledge of GPU memory management, cache management, or high-performance networking.
- Experience with distributed systems programming.
- Experience contributing to large open source projects (use of GitHub, bug tracking, branching/merging code, OSS licensing and patch handling).
Compensation and other details
- Base salary ranges provided by level:
- Level 3: 152,000 USD - 241,500 USD
- Level 4: 184,000 USD - 287,500 USD
- Eligible for equity and benefits.
- Location: Santa Clara, California, United States.
- Applications accepted at least until February 22, 2026.
- This posting is for an existing vacancy. NVIDIA uses AI tools in recruiting processes and is an equal opportunity employer.