Senior Software Engineer, AI Systems - vLLM And MLPerf
at Nvidia
USD 184,000-356,500 per year
SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
Software Development @ 6 Docker @ 4 Kubernetes @ 4 DevOps @ 4 Python @ 6 GCP @ 4 GitHub @ 4 CI/CD @ 4 Algorithms @ 4 Distributed Systems @ 7 AWS @ 4 Azure @ 4 Communication @ 4 Planning @ 4 API @ 4 LLM @ 4 PyTorch @ 4 CUDA @ 3 GPU @ 3Details
We are seeking highly skilled and motivated software engineers to join our vLLM & MLPerf team. You will define and build benchmarks for MLPerf Inference, the industry-leading benchmark suite for inference system-level performance, as well as contribute to vLLM and optimize its performance to the extreme for those benchmarks on NVIDIA's latest GPUs.
Responsibilities
- Design and implement highly efficient inference systems for large-scale deployments of generative AI models.
- Define inference benchmarking methodologies and build tools that will be embraced across the industry (MLPerf Inference).
- Develop, profile, debug, and optimize low-level system components and algorithms to enhance throughput and latency for MLPerf Inference benchmarks on the newest NVIDIA GPUs.
- Productionize inference systems with uncompromised software quality.
- Collaborate with researchers and engineers to productionize trending model architectures, inference techniques, and quantization methods.
- Contribute to the design of APIs, abstractions, and UX to make it easier to scale model deployment while maintaining usability and flexibility.
- Participate in design discussions, code reviews, and technical planning to ensure alignment with business goals.
- Stay up to date with the latest advancements in inference system-level optimization; translate research ideas into practical, robust systems. Explorations and academic publications are encouraged.
Requirements
- Bachelor's, Master's, or PhD degree in Computer Science/Engineering, Software Engineering, a related field, or equivalent experience.
- 5+ years of experience in software development, preferably with Python and C++.
- Deep understanding of deep learning algorithms, distributed systems, parallel computing, and high-performance computing principles.
- Hands-on experience with ML frameworks (e.g., PyTorch) and inference engines (e.g., vLLM and SGLang).
- Experience optimizing compute, memory, and communication performance for deployments of large models.
- Familiarity with GPU programming (CUDA), NCCL, and performance profiling tools.
- Ability to work closely with research and engineering teams to translate pioneering research into concrete designs and robust code.
- Excellent problem-solving skills, with the ability to debug sophisticated systems.
- Passion for building high-impact software that pushes the boundaries of large-scale AI.
Ways to Stand Out
- Background building and optimizing LLM inference engines such as vLLM and SGLang.
- Experience building ML compilers such as Triton and Torch Dynamo/Inductor.
- Experience working with cloud platforms (AWS, GCP, or Azure), containerization tools (Docker), and orchestration infrastructures (Kubernetes, Slurm).
- Exposure to DevOps practices, CI/CD pipelines, and infrastructure as code.
- Contributions to open-source projects (please provide a list of GitHub PRs you submitted).
Compensation & Benefits
- Base salary ranges:
- Level 4: 184,000 USD - 287,500 USD
- Level 5: 224,000 USD - 356,500 USD
- You will also be eligible for equity and benefits.
Additional Information
- Location: Santa Clara, California, United States. Note: #LI-Hybrid
- Applications for this job will be accepted at least until October 12, 2025.
- NVIDIA is an equal opportunity employer and is committed to fostering a diverse work environment.