Engineering Manager, LLM Performance

at Nvidia

📍 Santa Clara, United States

USD 224,000-431,200 per year

MIDDLE

✅ Hybrid

Used Tools & Technologies

Not specified

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Software Development @ 3 Python @ 6 Leadership @ 3 API @ 3 Technical Leadership @ 6 LLM @ 3 CUDA @ 6 GPU @ 3 AI @ 3 vLLM @ 3 TensorRT @ 3 SGLang @ 3

Details

At NVIDIA, we aren't just powering the AI revolution — we’re accelerating it. We are accelerating LLM inference across the stack and across all open source LLM frameworks like TensorRT LLM, vLLM, and SGLang. With demand for AI exploding, particularly in the realm of large language models (LLMs) and vision language models (VLMs, VLAs), we are significantly expanding our team.

We’re seeking a highly skilled and driven Engineering Manager to take the lead in accelerating the next generation of LLM/VLM/VLA inference software technologies that will define the future of AI. This is a high-impact, hands-on leadership role at the intersection of deep technical expertise and world-class management. You won't just manage; you'll architect and guide a brilliant team of engineers who are pushing the performance of LLM inference. Your work will be highly collaborative, interfacing directly with NVIDIA Researchers, GPU Architects, and other teams across the company to ensure we ship production-grade, lightning-fast software that sets the global standard for AI performance.

Responsibilities

Lead and grow a team responsible for pushing the performance of LLM inference across multiple LLM frameworks, including TensorRT LLM, vLLM, SGLang, and Dynamo on our datacenter products.
Drive the design, implementation, and optimization of features that are key to performance in LLM inference.
Continuously improve the performance of LLM inference on current and upcoming NVIDIA datacenter architectures and GPUs.
Continuously improve the performance of LLM inference of important foundation models.
Work with inference benchmark teams to help tune performance for key workloads.
Integrate cutting-edge technologies developed at NVIDIA and offer an intuitive developer experience for LLM deployment.
Lead software development execution, with responsibility for project planning, milestone delivery, and cross-functional coordination.

Requirements

MS, PhD, or equivalent experience in Computer Science, Computer Engineering, AI, or a related technical field.
7+ years of overall software engineering experience, including 3+ years of technical leadership experience.
Proven ability to lead and scale high-performing engineering teams, especially across distributed and cross-functional groups.
Strong background in C++ or Python, with expertise in software design and delivering production-quality software libraries.
Demonstrated expertise in large language models (LLM) and/or vision language models (VLM) and/or inference in general.

Ways to Stand Out

Deep understanding of GPU architecture, CUDA programming, and system-level performance tuning.
Background in LLM inference or working with frameworks such as TensorRT-LLM, vLLM, or SGLang.
Passion for building scalable, user-friendly APIs and enabling developers in the AI ecosystem.
Proven track record of growing and managing a team that encourages idea sharing and professional growth.

Compensation & Benefits

Base salary ranges provided by location and level:
- Level 3: 224,000 USD - 356,500 USD
- Level 4: 272,000 USD - 431,250 USD
You will also be eligible for equity and benefits (link to NVIDIA benefits).

Additional Information

Location: Santa Clara, CA, United States. #LI-Hybrid
Applications for this job will be accepted at least until June 27, 2026.
NVIDIA uses AI tools in its recruiting processes and is an equal opportunity employer committed to diversity and inclusion.