Senior Software Engineer, AI Inference

at Nvidia

📍 Toronto, Canada

CAD 135,000-220,000 per year

SENIOR

✅ Hybrid

Used Tools & Technologies

Machine Learning

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Kubernetes @ 4 Communication @ 7 Debugging @ 4 OSS @ 4 LLM @ 4 GPU @ 4 AI @ 7 Profiling @ 4 vLLM @ 4 Slurm @ 4 SGLang @ 4 HPC @ 6 Performance Analysis @ 3

Details

Help push the boundaries of AI inference at NVIDIA by combining deep systems knowledge with hands-on customer engagement. You will profile real deployments, benchmark across GPU clusters, and turn insights into improvements that benefit customers and open-source projects such as vLLM.

Responsibilities

Partner directly with customer engineering teams through long-term technical engagements to understand LLM serving architectures and performance goals.
Design and implement end-to-end benchmarking campaigns across Kubernetes and Slurm environments to surface actionable insights.
Set up and operate vLLM serving deployments on GPU clusters; tune configurations for throughput, latency, and efficiency.
Collect Nsight Systems / Nsight Compute profiling traces to identify performance gaps relative to reference frameworks.
Develop detailed performance plans based on profiling findings and collaborate with NVIDIA kernel engineering and OSS vLLM teams to drive improvements.
Build internal tools, benchmarking harnesses, and automation pipelines to raise team and customer productivity.
Document architectures, findings, and recommendations for technical audiences and contribute improvements back to vLLM and related open-source projects.

Requirements

Bachelor’s, Master’s, or PhD in Computer Science, Computer Engineering, or equivalent experience.
5+ years of industry experience building and operating complex, production-grade software systems with strong instincts for systems at scale.
Hands-on experience deploying and operating LLM inference workloads, particularly with vLLM, including configuration, optimization, and debugging in real environments.
Proficiency with container orchestration (Kubernetes) and HPC scheduling (Slurm) for GPU-accelerated workloads.
Solid understanding of LLM serving fundamentals: batching strategies (continuous batching, chunked prefill), KV cache management, and tensor/pipeline parallelism.
Familiarity with GPU performance analysis: memory hierarchy, utilization, roofline modeling, and profiling with Nsight Systems or Nsight Compute.
Strong written and verbal communication skills; ability to present technical findings clearly and navigate ambiguous, open-ended customer problems.

Ways to Stand Out

Experience with NVIDIA Dynamo or other disaggregated inference serving frameworks.
Contributions to open-source inference or ML systems projects (particularly vLLM or SGLang).
Background with ML compilers or GPU kernel development (Triton, CUTLASS, TorchInductor).
Experience building developer tools or internal platforms that improved team productivity.
Prior experience in a customer-facing or forward-deployed engineering capacity within a technical product organization.

Compensation & Benefits

Base salary ranges: 135,000 CAD - 185,000 CAD for Level 3, and 170,000 CAD - 220,000 CAD for Level 4.
Eligible for equity and benefits (see: https://www.nvidia.com/en-us/benefits/).

Additional Information

Applications accepted at least until April 14, 2026.
This posting is for an existing vacancy.
NVIDIA uses AI tools in its recruiting processes.
#LI-Hybrid