Senior Deep Learning Manager, LLM Inference

at Nvidia
USD 272,000-425,500 per year
SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Marketing @ 4 Software Development @ 8 Python @ 3 Leadership @ 4 Communication @ 4 OSS @ 3 LLM @ 4 PyTorch @ 4 CUDA @ 3 GPU @ 4

Details

NVIDIA is at the forefront of the generative AI revolution. The Inference Benchmarking (IB) team focuses on advanced inference server performance for Large Language Models (LLMs). This role leads a team that characterizes the latest LLMs and inference servers, profiles GPU kernel-level performance, develops profiling and analysis tools, and contributes to deep learning software projects to ensure NVIDIA maintains leadership in inference performance.

Responsibilities

  • Manage a team that characterizes the latest LLMs and inference servers such as TensorRT-LLM, vLLM, and SGLang.
  • Work with the performance marketing team to create content (blog posts and other written materials) highlighting TensorRT-LLM performance.
  • Collaborate with engineers from AI startups to debug and establish standard methodologies.
  • Profile GPU kernel-level performance to identify hardware and software optimization opportunities.
  • Develop profiling and analysis software tools to keep pace with network scaling.
  • Contribute to deep learning software projects, including PyTorch, TRT-LLM, vLLM, and SGLang.
  • Verify that TRT-LLM performance meets expectations for new GPU product launches.
  • Collaborate across software, research, and product teams to guide the direction of inference serving and ensure world-class performance.

Requirements

  • Master’s or PhD in Computer Science, Computer Engineering, or a related field, or equivalent experience.
  • 10+ years of overall software development experience and at least 3 years of management experience.
  • Detailed knowledge of deep learning inference serving and PyTorch programming.
  • Experience with profiling and compiler optimizations.
  • Proficiency in Python and C++ and familiarity with CUDA.
  • Experience with LLMs and their performance challenges and opportunities.
  • Solid understanding of CPU and GPU microarchitecture and performance characteristics.
  • Experience with complex software projects such as frameworks, compilers, or operating systems.
  • Good written and verbal communication skills and the ability to work independently and collaboratively in a fast-paced environment.

Ways to Stand Out

  • Demonstrated drive to continuously improve software and hardware performance.
  • Examples of novel use cases for agentic AI tools in the workplace.
  • Experience with database and visualization tools such as D3.js.

Additional Details

  • Familiarity with terms and technologies mentioned in the role: pre-fill phase, generation phase, paged attention, MoE, Tensor Parallel, Llama, GPT-OSS, and HuggingFace.
  • Base salary range: 272,000 USD - 425,500 USD (final base salary determined by location, experience, and pay of employees in similar positions).
  • Eligible for equity and company benefits.
  • Applications accepted at least until September 6, 2025.

NVIDIA values diversity and is an equal opportunity employer.