Senior Software Engineer, AI Inference Systems

at Nvidia
📍 Toronto, Canada
CAD 142,500-318,500 per year
SENIOR
✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Docker @ 3 Go @ 1 Kubernetes @ 3 Linux @ 3 Python @ 1 GCP @ 4 GitHub @ 4 CI/CD @ 4 Algorithms @ 4 Data Structures @ 4 Distributed Systems @ 4 AWS @ 4 Azure @ 4 Communication @ 4 Parallel Programming @ 4 Rust @ 1 Debugging @ 4 LLM @ 4 PyTorch @ 4 CUDA @ 3 GPU @ 4

Details

We are seeking highly skilled and motivated software engineers to build AI inference systems that serve large-scale models with extreme efficiency. You will architect and implement high-performance inference stacks, optimize GPU kernels and compilers, drive industry benchmarks, and scale workloads across multi-GPU, multi-node, and multi-cloud environments. You will collaborate across inference, compiler, scheduling, and performance teams to push the frontier of accelerated computing for AI.

Responsibilities

  • Contribute features to vLLM to support the newest models and NVIDIA GPU hardware features.
  • Profile and optimize the inference framework (vLLM) using methods such as speculative decoding, data/tensor/expert/pipeline-parallelism, and prefill-decode disaggregation.
  • Develop, optimize, and benchmark GPU kernels (hand-tuned and compiler-generated) using techniques such as fusion, autotuning, and memory/layout optimization.
  • Build and extend high-level DSLs and compiler infrastructure to boost kernel developer productivity and approach peak hardware utilization.
  • Define and build inference benchmarking methodologies and tools; contribute new benchmarks and NVIDIA submissions to the MLPerf Inference benchmarking suite.
  • Architect scheduling and orchestration of containerized large-scale inference deployments on GPU clusters across clouds.
  • Conduct and publish original research that advances ML systems, integrate research prototypes into NVIDIA software products.

Requirements

  • Bachelor’s degree (or equivalent experience) in Computer Science, Computer Engineering, or Software Engineering with 7+ years of experience; or Master’s degree with 5+ years; or PhD with thesis and top-tier publications in ML Systems, GPU architecture, or high-performance computing.
  • Strong programming skills in Python and C/C++; experience with Go or Rust is a plus.
  • Solid CS fundamentals: algorithms & data structures, operating systems, computer architecture, parallel programming, distributed systems, and deep learning theory.
  • Experience in performance engineering for ML frameworks (e.g., PyTorch) and inference engines (e.g., vLLM, SGLang).
  • Familiarity with GPU programming and performance: CUDA, memory hierarchy, streams, NCCL; proficiency with profiling/debug tools (e.g., Nsight Systems/Compute).
  • Experience with containers and orchestration (Docker, Kubernetes, Slurm); familiarity with Linux namespaces and cgroups.
  • Excellent debugging, problem-solving, and communication skills; ability to excel in a fast-paced, cross-functional environment.

Ways to Stand Out

  • Experience building and optimizing LLM inference engines (e.g., vLLM, SGLang).
  • Hands-on work with ML compilers and DSLs (e.g., Triton, TorchDynamo/Inductor, MLIR/LLVM, XLA), GPU libraries (e.g., CUTLASS) and features (e.g., CUDA Graph, Tensor Cores).
  • Experience contributing to containerization/virtualization technologies such as containerd, CRI-O, or CRIU.
  • Experience with cloud platforms (AWS, GCP, Azure), infrastructure-as-code, CI/CD, and production observability.
  • Contributions to open-source projects and/or publications (include links to GitHub PRs, papers, artifacts).

Compensation & Benefits

  • Base salary is location- and experience-dependent. Range provided:
    • Level 4: 142,500 CAD - 247,000 CAD
    • Level 5: 183,750 CAD - 318,500 CAD
  • Eligible for equity and company benefits.

Additional Information

  • Hybrid role (#LI-Hybrid).
  • Applications accepted at least until November 24, 2025.