Software Engineer, AI Systems - vLLM and MLPerf

at Nvidia
📍 Toronto, Canada
CAD 116,200-247,000 per year
MIDDLE
✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Software Development @ 5 Docker @ 3 Kubernetes @ 3 DevOps @ 3 Python @ 5 GCP @ 3 GitHub @ 3 CI/CD @ 3 Algorithms @ 3 Distributed Systems @ 6 AWS @ 3 Azure @ 3 Communication @ 3 Planning @ 3 API @ 3 LLM @ 3 PyTorch @ 3 CUDA @ 2 GPU @ 2

Details

We are seeking highly skilled and motivated software engineers to join the vLLM & MLPerf team at NVIDIA. You'll define and build benchmarks for MLPerf Inference (the industry-leading benchmark suite for inference system-level performance), contribute to vLLM, and optimize performance for these benchmarks on bleeding-edge NVIDIA GPUs.

Responsibilities

  • Design and implement highly efficient inference systems for large-scale deployments of generative AI models.
  • Define inference benchmarking methodologies and build tools that will be adopted across the industry (MLPerf Inference).
  • Develop, profile, debug, and optimize low-level system components and algorithms to improve throughput and minimize latency for the MLPerf Inference benchmarks on cutting-edge NVIDIA GPUs.
  • Productionize inference systems with uncompromised software quality.
  • Collaborate with researchers and engineers to productionize innovative model architectures, inference techniques, and quantization methods.
  • Contribute to the design of APIs, abstractions, and UX that make it easier to scale model deployment while maintaining usability and flexibility.
  • Participate in design discussions, code reviews, and technical planning to align the product with business goals.
  • Stay up to date with the latest advancements; propose novel research ideas in inference system-level optimization and translate them into practical, robust systems. Explorations and academic publications are encouraged.

Requirements

  • Bachelor’s, Master’s, or PhD in Computer Science/Engineering, Software Engineering, a related field, or equivalent experience.
  • 5+ years of software development experience, preferably with Python and C++.
  • Deep understanding of deep learning algorithms, distributed systems, parallel computing, and high-performance computing principles.
  • Hands-on experience with ML frameworks (e.g., PyTorch) and inference engines (e.g., vLLM and SGLang).
  • Experience optimizing compute, memory, and communication performance for deployments of large models.
  • Familiarity with GPU programming, CUDA, NCCL, and performance profiling tools.
  • Ability to work closely with research and engineering teams, translating state-of-the-art research ideas into concrete designs and robust code.
  • Excellent problem-solving skills and ability to debug complex systems.
  • A passion for building high-impact software that pushes the boundaries of large-scale AI.

Ways to stand out

  • Background in building and optimizing LLM inference engines such as vLLM and SGLang.
  • Experience building ML compilers such as Triton, Torch Dynamo/Inductor.
  • Experience with cloud platforms (AWS, GCP, Azure), containerization (Docker), and orchestration/infrastructure (Kubernetes, Slurm).
  • Exposure to DevOps practices, CI/CD pipelines, and infrastructure as code.
  • Contributions to open-source projects (please provide GitHub PRs).

Benefits & Compensation

  • Base salary ranges (location/level dependent):
    • Level 3: 116,250 CAD - 201,500 CAD
    • Level 4: 142,500 CAD - 247,000 CAD
  • You will also be eligible for equity and benefits (see NVIDIA benefits).

Additional information

  • Location: Toronto, Canada (hybrid). #LI-Hybrid
  • Applications accepted at least until October 12, 2025.