Manager, Large Language Model Inference

at Nvidia
USD 184,000-356,500 per year
MIDDLE
βœ… Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Software Development @ 3 Python @ 6 Leadership @ 6 API @ 3 Technical Leadership @ 6 Engineering Management @ 3 LLM @ 3 CUDA @ 6 GPU @ 3

Details

At NVIDIA, the TensorRT inference platform delivers optimized deployment of deep learning models on NVIDIA GPUs. This role is a hands-on engineering management position focused on building the next generation of LLM/VLM/VLA inference software technologies, working closely with researchers, GPU architects, and cross-functional teams to ship production-grade, high-performance inference software.

Responsibilities

  • Lead and grow a team responsible for specialized kernel development, runtime optimizations, and frameworks for LLM inference.
  • Drive the design, development, and delivery of production inference software targeting NVIDIA's next-generation enterprise and edge hardware platforms.
  • Integrate cutting-edge technologies developed at NVIDIA and provide an intuitive developer experience for LLM deployment.
  • Lead software development execution: project planning, milestone delivery, and cross-functional coordination.
  • Interface with NVIDIA researchers, GPU architects, and other teams to ensure production-grade, high-performance software.

Requirements

  • MS, PhD, or equivalent experience in Computer Science, Computer Engineering, AI, or a related technical field.
  • 7+ years of overall software engineering experience, including 3+ years of technical leadership experience.
  • Proven ability to lead and scale high-performing engineering teams, especially across distributed and cross-functional groups.
  • Strong background in C++ or Python, with expertise in software design and delivering production-quality software libraries.
  • Demonstrated expertise in large language models (LLM) and/or vision language models (VLM).

Ways to stand out / preferred:

  • Deep understanding of GPU architecture, CUDA programming, and system-level performance tuning.
  • Background in LLM inference or experience with frameworks such as TensorRT-LLM, vLLM, or SGLang.
  • Passion for building scalable, user-friendly APIs and enabling developers in the AI ecosystem.
  • Proven track record of growing and managing teams that encourage idea sharing and professional growth.

Benefits

  • Base salary (location-, level-, and experience-dependent). Listed base salary ranges:
    • Level 2: 184,000 USD - 287,500 USD
    • Level 3: 224,000 USD - 356,500 USD
  • Eligibility for equity and additional benefits (see NVIDIA benefits pages).
  • Hybrid work indication (#LI-Hybrid).

Additional information

  • Location provided: Santa Clara, CA, United States.
  • Applications accepted at least until November 4, 2025.
  • NVIDIA is an equal opportunity employer committed to diversity and inclusion.