Senior Software Engineer, AI Systems - vLLM and MLPerf

at Nvidia
USD 184,000-356,500 per year
SENIOR
✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Software Development @ 6 Docker @ 4 Kubernetes @ 4 DevOps @ 4 Python @ 6 GCP @ 4 GitHub @ 4 CI/CD @ 4 Algorithms @ 4 Distributed Systems @ 7 Hiring @ 4 AWS @ 4 Azure @ 4 Communication @ 4 Planning @ 4 API @ 4 LLM @ 4 PyTorch @ 4 CUDA @ 3 GPU @ 3

Details

We are seeking highly skilled and motivated software engineers to join the vLLM & MLPerf team. You will define and build benchmarks for MLPerf Inference, the industry-leading benchmark suite for inference system-level performance, and contribute to vLLM and optimize its performance to the extreme for those benchmarks on NVIDIA's latest GPUs.

Responsibilities

  • Design and implement highly efficient inference systems for large-scale deployments of generative AI models.
  • Define inference benchmarking methodologies and build tools that will be embraced across the industry.
  • Develop, profile, debug, and optimize low-level system components and algorithms to enhance throughput and latency for the MLPerf Inference benchmarks on the newest NVIDIA GPUs.
  • Productionize inference systems with uncompromised software quality.
  • Collaborate with researchers and engineers to productionize trending model architectures, inference techniques, and quantization methods.
  • Contribute to the design of APIs, abstractions, and UX that make it easier to scale model deployment while maintaining usability and flexibility.
  • Participate in design discussions, code reviews, and technical planning to ensure the product aligns with business goals.
  • Stay up to date with the latest advancements and propose novel research ideas in inference system-level optimization; translate research into practical, robust systems. Explorations and academic publications are encouraged.

Requirements

  • Bachelor's, Master's, or PhD in Computer Science/Engineering, Software Engineering, a related field, or equivalent experience.
  • 5+ years of experience in software development, preferably with Python and C++.
  • Deep understanding of deep learning algorithms, distributed systems, parallel computing, and high-performance computing principles.
  • Hands-on experience with ML frameworks (e.g., PyTorch) and inference engines (e.g., vLLM and SGLang).
  • Experience optimizing compute, memory, and communication performance for deployments of large models.
  • Familiarity with GPU programming, CUDA, NCCL, and performance profiling tools.
  • Ability to work closely with both research and engineering teams, translating research ideas into concrete designs and robust code.
  • Excellent problem-solving skills, with the ability to debug sophisticated systems.
  • A passion for building high-impact software that pushes the boundaries of large-scale AI.

Ways to stand out

  • Background building and optimizing LLM inference engines such as vLLM and SGLang.
  • Experience building ML compilers such as Triton, Torch Dynamo/Inductor.
  • Experience with cloud platforms (AWS, GCP, or Azure), containerization (Docker), and orchestration (Kubernetes, Slurm).
  • Exposure to DevOps practices, CI/CD pipelines, and infrastructure-as-code.
  • Contributions to open-source projects (applicants encouraged to provide GitHub PRs).

Compensation & Benefits

  • Base salary ranges by level: Level 4: 184,000 USD - 287,500 USD; Level 5: 224,000 USD - 356,500 USD.
  • Eligible for equity and company benefits (link provided in original posting).

Additional information

  • Hybrid role indicator: #LI-Hybrid.
  • Applications accepted at least until October 12, 2025.
  • NVIDIA is an equal opportunity employer and values diversity in hiring and promotion practices.