Manager, Large Language Model Inference
at Nvidia
π Santa Clara, United States
USD 184,000-356,500 per year
SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
Software Development @ 3 Python @ 6 Leadership @ 6 API @ 3 Technical Leadership @ 6 Engineering Management @ 3 LLM @ 3 CUDA @ 6 GPU @ 3Details
At NVIDIA, the TensorRT inference platform delivers optimized deployment of deep learning models on NVIDIA GPUs. This role is a hands-on engineering management position focused on building the next generation of LLM/VLM/VLA inference software technologies, working closely with researchers, GPU architects, and cross-functional teams to ship production-grade, high-performance inference software.
Responsibilities
- Lead and grow a team responsible for specialized kernel development, runtime optimizations, and frameworks for LLM inference.
- Drive the design, development, and delivery of production inference software targeting NVIDIA's next-generation enterprise and edge hardware platforms.
- Integrate cutting-edge technologies developed at NVIDIA and provide an intuitive developer experience for LLM deployment.
- Lead software development execution: project planning, milestone delivery, and cross-functional coordination.
- Interface with NVIDIA researchers, GPU architects, and other teams to ensure production-grade, high-performance software.
Requirements
- MS, PhD, or equivalent experience in Computer Science, Computer Engineering, AI, or a related technical field.
- 7+ years of overall software engineering experience, including 3+ years of technical leadership experience.
- Proven ability to lead and scale high-performing engineering teams, especially across distributed and cross-functional groups.
- Strong background in C++ or Python, with expertise in software design and delivering production-quality software libraries.
- Demonstrated expertise in large language models (LLM) and/or vision language models (VLM).
Ways to stand out / preferred:
- Deep understanding of GPU architecture, CUDA programming, and system-level performance tuning.
- Background in LLM inference or experience with frameworks such as TensorRT-LLM, vLLM, or SGLang.
- Passion for building scalable, user-friendly APIs and enabling developers in the AI ecosystem.
- Proven track record of growing and managing teams that encourage idea sharing and professional growth.
Benefits
- Base salary (location-, level-, and experience-dependent). Listed base salary ranges:
- Level 2: 184,000 USD - 287,500 USD
- Level 3: 224,000 USD - 356,500 USD
- Eligibility for equity and additional benefits (see NVIDIA benefits pages).
- Hybrid work indication (#LI-Hybrid).
Additional information
- Location provided: Santa Clara, CA, United States.
- Applications accepted at least until November 4, 2025.
- NVIDIA is an equal opportunity employer committed to diversity and inclusion.