Manager, Large Language Model Inference

at Nvidia

📍 Santa Clara, United States

USD 184,000-356,500 per year

MIDDLE

✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Software Development @ 3 Python @ 6 Leadership @ 6 API @ 3 Technical Leadership @ 6 Engineering Management @ 3 LLM @ 3 CUDA @ 6 GPU @ 3

Details

At NVIDIA, the TensorRT inference platform delivers optimized deployment of deep learning models on NVIDIA GPUs. This role is a hands-on engineering management position focused on building the next generation of LLM/VLM/VLA inference software technologies, working closely with researchers, GPU architects, and cross-functional teams to ship production-grade, high-performance inference software.

Responsibilities

Lead and grow a team responsible for specialized kernel development, runtime optimizations, and frameworks for LLM inference.
Drive the design, development, and delivery of production inference software targeting NVIDIA's next-generation enterprise and edge hardware platforms.
Integrate cutting-edge technologies developed at NVIDIA and provide an intuitive developer experience for LLM deployment.
Lead software development execution: project planning, milestone delivery, and cross-functional coordination.
Interface with NVIDIA researchers, GPU architects, and other teams to ensure production-grade, high-performance software.

Requirements

MS, PhD, or equivalent experience in Computer Science, Computer Engineering, AI, or a related technical field.
7+ years of overall software engineering experience, including 3+ years of technical leadership experience.
Proven ability to lead and scale high-performing engineering teams, especially across distributed and cross-functional groups.
Strong background in C++ or Python, with expertise in software design and delivering production-quality software libraries.
Demonstrated expertise in large language models (LLM) and/or vision language models (VLM).

Ways to stand out / preferred:

Deep understanding of GPU architecture, CUDA programming, and system-level performance tuning.
Background in LLM inference or experience with frameworks such as TensorRT-LLM, vLLM, or SGLang.
Passion for building scalable, user-friendly APIs and enabling developers in the AI ecosystem.
Proven track record of growing and managing teams that encourage idea sharing and professional growth.

Benefits

Base salary (location-, level-, and experience-dependent). Listed base salary ranges:
- Level 2: 184,000 USD - 287,500 USD
- Level 3: 224,000 USD - 356,500 USD
Eligibility for equity and additional benefits (see NVIDIA benefits pages).
Hybrid work indication (#LI-Hybrid).

Additional information

Location provided: Santa Clara, CA, United States.
Applications accepted at least until November 4, 2025.
NVIDIA is an equal opportunity employer committed to diversity and inclusion.