Used Tools & Technologies
Not specified
Required Skills & Competences ?
Marketing @ 4 Software Development @ 8 Python @ 3 Leadership @ 4 Communication @ 4 OSS @ 3 LLM @ 4 PyTorch @ 4 CUDA @ 3 GPU @ 4Details
NVIDIA is at the forefront of the generative AI revolution. The Inference Benchmarking (IB) team focuses on advanced inference server performance for Large Language Models (LLMs). This role leads a team that characterizes the latest LLMs and inference servers, profiles GPU kernel-level performance, develops profiling and analysis tools, and contributes to deep learning software projects to ensure NVIDIA maintains leadership in inference performance.
Responsibilities
- Manage a team that characterizes the latest LLMs and inference servers such as TensorRT-LLM, vLLM, and SGLang.
- Work with the performance marketing team to create content (blog posts and other written materials) highlighting TensorRT-LLM performance.
- Collaborate with engineers from AI startups to debug and establish standard methodologies.
- Profile GPU kernel-level performance to identify hardware and software optimization opportunities.
- Develop profiling and analysis software tools to keep pace with network scaling.
- Contribute to deep learning software projects, including PyTorch, TRT-LLM, vLLM, and SGLang.
- Verify that TRT-LLM performance meets expectations for new GPU product launches.
- Collaborate across software, research, and product teams to guide the direction of inference serving and ensure world-class performance.
Requirements
- Master’s or PhD in Computer Science, Computer Engineering, or a related field, or equivalent experience.
- 10+ years of overall software development experience and at least 3 years of management experience.
- Detailed knowledge of deep learning inference serving and PyTorch programming.
- Experience with profiling and compiler optimizations.
- Proficiency in Python and C++ and familiarity with CUDA.
- Experience with LLMs and their performance challenges and opportunities.
- Solid understanding of CPU and GPU microarchitecture and performance characteristics.
- Experience with complex software projects such as frameworks, compilers, or operating systems.
- Good written and verbal communication skills and the ability to work independently and collaboratively in a fast-paced environment.
Ways to Stand Out
- Demonstrated drive to continuously improve software and hardware performance.
- Examples of novel use cases for agentic AI tools in the workplace.
- Experience with database and visualization tools such as D3.js.
Additional Details
- Familiarity with terms and technologies mentioned in the role: pre-fill phase, generation phase, paged attention, MoE, Tensor Parallel, Llama, GPT-OSS, and HuggingFace.
- Base salary range: 272,000 USD - 425,500 USD (final base salary determined by location, experience, and pay of employees in similar positions).
- Eligible for equity and company benefits.
- Applications accepted at least until September 6, 2025.
NVIDIA values diversity and is an equal opportunity employer.