Used Tools & Technologies
Machine LearningRequired Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Kubernetes @ 4
Communication @ 7
Debugging @ 4
OSS @ 4
LLM @ 4
GPU @ 4
AI @ 7
Profiling @ 4
vLLM @ 4
Slurm @ 4
SGLang @ 4
HPC @ 6
Performance Analysis @ 3
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
Help push the boundaries of AI inference at NVIDIA by combining deep systems knowledge with hands-on customer engagement. You will profile real deployments, benchmark across GPU clusters, and turn insights into improvements that benefit customers and open-source projects such as vLLM.
Responsibilities
- Partner directly with customer engineering teams through long-term technical engagements to understand LLM serving architectures and performance goals.
- Design and implement end-to-end benchmarking campaigns across Kubernetes and Slurm environments to surface actionable insights.
- Set up and operate vLLM serving deployments on GPU clusters; tune configurations for throughput, latency, and efficiency.
- Collect Nsight Systems / Nsight Compute profiling traces to identify performance gaps relative to reference frameworks.
- Develop detailed performance plans based on profiling findings and collaborate with NVIDIA kernel engineering and OSS vLLM teams to drive improvements.
- Build internal tools, benchmarking harnesses, and automation pipelines to raise team and customer productivity.
- Document architectures, findings, and recommendations for technical audiences and contribute improvements back to vLLM and related open-source projects.
Requirements
- Bachelor’s, Master’s, or PhD in Computer Science, Computer Engineering, or equivalent experience.
- 5+ years of industry experience building and operating complex, production-grade software systems with strong instincts for systems at scale.
- Hands-on experience deploying and operating LLM inference workloads, particularly with vLLM, including configuration, optimization, and debugging in real environments.
- Proficiency with container orchestration (Kubernetes) and HPC scheduling (Slurm) for GPU-accelerated workloads.
- Solid understanding of LLM serving fundamentals: batching strategies (continuous batching, chunked prefill), KV cache management, and tensor/pipeline parallelism.
- Familiarity with GPU performance analysis: memory hierarchy, utilization, roofline modeling, and profiling with Nsight Systems or Nsight Compute.
- Strong written and verbal communication skills; ability to present technical findings clearly and navigate ambiguous, open-ended customer problems.
Ways to Stand Out
- Experience with NVIDIA Dynamo or other disaggregated inference serving frameworks.
- Contributions to open-source inference or ML systems projects (particularly vLLM or SGLang).
- Background with ML compilers or GPU kernel development (Triton, CUTLASS, TorchInductor).
- Experience building developer tools or internal platforms that improved team productivity.
- Prior experience in a customer-facing or forward-deployed engineering capacity within a technical product organization.
Compensation & Benefits
- Base salary ranges: 135,000 CAD - 185,000 CAD for Level 3, and 170,000 CAD - 220,000 CAD for Level 4.
- Eligible for equity and benefits (see: https://www.nvidia.com/en-us/benefits/).
Additional Information
- Applications accepted at least until April 14, 2026.
- This posting is for an existing vacancy.
- NVIDIA uses AI tools in its recruiting processes.
- #LI-Hybrid