DL Performance Software Engineer - LLM Inference

at Nvidia

📍 Toronto, Canada

CAD 93,800-201,500 per year

MIDDLE

✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Python @ 3 GitHub @ 3 Distributed Systems @ 3 Machine Learning @ 3 Leadership @ 3 Parallel Programming @ 3 Performance Optimization @ 3 Debugging @ 3 LLM @ 3 PyTorch @ 3 CUDA @ 3 GPU @ 3

Details

At NVIDIA, we believe artificial intelligence (AI) will fundamentally transform how people live and work. Our mission is to advance AI research and development to create groundbreaking technologies that enable anyone to harness the power of AI and benefit from its potential. Our team consists of experts in AI, systems and performance optimization. Our leadership includes world-renowned experts in AI systems who have received multiple academic and industry research awards.

As a member of the LLM inference team you will help build innovative software with the goals of enabling LLM inference to be more efficient, scalable, and accessible. You will architect and implement inference stacks, and collaborate with teams working on resource orchestration, distributed systems, inference engine optimization, and high-performance GPU kernels.

Responsibilities

Write safe, scalable, modular, and high-quality C++ and Python code for core backend software for LLM inference.
Perform benchmarking, profiling, and system-level programming for GPU applications.
Provide code reviews, design documents, and tutorials to facilitate collaboration across the team.
Conduct unit tests and performance tests for different stages of the inference pipeline.

Requirements

Bachelor’s degree in Computer Science, Computer Engineering, a relevant technical field, or equivalent experience.
Strong coding skills in Python and C/C++.
2+ years of industry experience in software engineering or equivalent research experience.
Knowledgeable and passionate about machine learning and performance engineering.
Proven project experience building software where performance is a core offering.

Ways to stand out

Solid fundamentals in machine learning, deep learning, operating systems, computer architecture, and parallel programming.
Research experience in systems or machine learning.
Project experience in modern deep learning software such as PyTorch, CUDA, vLLM, SGLang, and TensorRT-LLM.
Experience with performance modeling, profiling, debugging, code optimization, or architectural knowledge of CPU and GPU.

We strongly encourage you to include sample projects (e.g., GitHub) that demonstrate the qualifications above.

You will also be eligible for equity and benefits. Applications for this job will be accepted at least until September 2, 2025.

Compensation

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 93,750 CAD - 162,500 CAD for Level 2, and 116,250 CAD - 201,500 CAD for Level 3.