Senior Deep Learning Manager, LLM Inference

at Nvidia

📍 Santa Clara, United States

USD 272,000-425,500 per year

SENIOR

✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Marketing @ 4 Software Development @ 8 Python @ 3 Leadership @ 4 Communication @ 4 OSS @ 3 LLM @ 4 PyTorch @ 4 CUDA @ 3 GPU @ 4

Details

NVIDIA is at the forefront of the generative AI revolution. The Inference Benchmarking (IB) team focuses on advanced inference server performance for Large Language Models (LLMs). This role leads a team that characterizes the latest LLMs and inference servers, profiles GPU kernel-level performance, develops profiling and analysis tools, and contributes to deep learning software projects to ensure NVIDIA maintains leadership in inference performance.

Responsibilities

Manage a team that characterizes the latest LLMs and inference servers such as TensorRT-LLM, vLLM, and SGLang.
Work with the performance marketing team to create content (blog posts and other written materials) highlighting TensorRT-LLM performance.
Collaborate with engineers from AI startups to debug and establish standard methodologies.
Profile GPU kernel-level performance to identify hardware and software optimization opportunities.
Develop profiling and analysis software tools to keep pace with network scaling.
Contribute to deep learning software projects, including PyTorch, TRT-LLM, vLLM, and SGLang.
Verify that TRT-LLM performance meets expectations for new GPU product launches.
Collaborate across software, research, and product teams to guide the direction of inference serving and ensure world-class performance.

Requirements

Master’s or PhD in Computer Science, Computer Engineering, or a related field, or equivalent experience.
10+ years of overall software development experience and at least 3 years of management experience.
Detailed knowledge of deep learning inference serving and PyTorch programming.
Experience with profiling and compiler optimizations.
Proficiency in Python and C++ and familiarity with CUDA.
Experience with LLMs and their performance challenges and opportunities.
Solid understanding of CPU and GPU microarchitecture and performance characteristics.
Experience with complex software projects such as frameworks, compilers, or operating systems.
Good written and verbal communication skills and the ability to work independently and collaboratively in a fast-paced environment.

Ways to Stand Out

Demonstrated drive to continuously improve software and hardware performance.
Examples of novel use cases for agentic AI tools in the workplace.
Experience with database and visualization tools such as D3.js.

Additional Details

Familiarity with terms and technologies mentioned in the role: pre-fill phase, generation phase, paged attention, MoE, Tensor Parallel, Llama, GPT-OSS, and HuggingFace.
Base salary range: 272,000 USD - 425,500 USD (final base salary determined by location, experience, and pay of employees in similar positions).
Eligible for equity and company benefits.
Applications accepted at least until September 6, 2025.

NVIDIA values diversity and is an equal opportunity employer.