Senior Systems Software Engineer - Deep Learning Solutions

at Nvidia

📍 Toronto, Canada

CAD 225,000-275,000 per year

SENIOR

✅ On-site

Used Tools & Technologies

Machine Learning

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Linux @ 4 Parallel Programming @ 4 Performance Optimization @ 4 CUDA @ 4 GPU @ 4 Deep Learning @ 4 AI @ 4 Profiling @ 4 Robotics @ 4 TensorRT @ 4

Details

NVIDIA is a global leader in physical AI powering self-driving cars, humanoid robots, intelligent environments, and medical devices. This role is a hands-on senior engineering position focused on deep learning inference optimization for autonomous vehicles and robotics on edge hardware. The work spans model-level analysis, kernel and runtime optimization, compiler interactions, and deployment on resource-constrained SOCs and GPUs.

Responsibilities

Engage directly with automotive OEMs, robotics partners, and internal hardware teams to analyze, debug, and optimize deep learning models on NVIDIA platforms, delivering production-ready solutions.
Own performance benchmarking efforts (MLPerf Edge and industry benchmarks), define methodology, ensure reproducibility, and drive actionable optimization priorities.
Inspect model architectures at the operator/kernel level and uncover performance bottlenecks through kernel traces and profiling.
Evaluate emerging model architectures (vision encoders, multi-modal VLMs, hybrid SSM-Transformer backbones, diffusion/flow matching decoders, multi-camera tokenizers) for compilation feasibility, memory footprint, and latency on target SOCs.
Collaborate with compiler, runtime, and hardware teams to bridge model-level insights with platform capabilities.
Contribute to build reviews and roadmap priorities informed by customer workload patterns.
Represent NVIDIA externally at conferences, webinars, and partner events; share optimization expertise and contribute guidelines and best practices.
Develop and deploy TensorRT and compiler-stack inference solutions for edge platforms (Jetson, DRIVE, GPU + ARM), create Proofs of Readiness (PORs), and work with compiler teams on Torch-TRT, MLIR-TRT, and related frameworks.

Requirements

Master's degree or equivalent experience in Computer Science, Electrical Engineering, or a related field.
12+ years of industry experience with over 8 years in deep learning model optimization, inference engineering, or neural network compilation.
5+ years of validated experience in embedded/edge software delivering production inference solutions in power- and latency-constrained environments.
Deep knowledge of modern DL architectures: transformers, attention variants, vision encoders (ViT), multi-modal/vision-language frameworks, and experience with diffusion models and/or state space models.
Expert knowledge of GPU architecture fundamentals, CUDA, and low-level performance optimization using heterogeneous computing.
Experience with TensorRT, compiler IRs, or equivalent inference optimization toolchains.
Solid understanding of embedded operating system internals (QNX/Linux), memory management, C/C++, and embedded/system software concepts.
Background in parallel programming (e.g., CUDA, OpenMP) and experience reasoning about memory hierarchies, data movement, and compute utilization.
Demonstrated ability to collaborate directly with external partners and customers to solve workload and performance problems within production constraints.

Preferred / Ways to Stand Out

Experience with ML compiler frameworks (TVM, MLIR, XLA, Triton) or contributions to inference runtime development.
Production deployment experience with autonomous vehicle perception or planning stacks and understanding of the full pipeline from sensors to trajectory output.
Familiarity with Physical AI model families (VLM + action expert architectures, end-to-end driving models, robot foundation models).
Contributions to MLPerf benchmarks and large-scale industry performance optimization efforts.
Experience with automotive safety standards (ISO 26262, SOTIF) and their implications for inference systems.
Experience leading technical initiatives across globally distributed teams.

Compensation & Benefits

Base salary range: 225,000 CAD - 275,000 CAD (determined by location, experience, and pay of employees in similar positions).
Eligible for equity and company benefits (link referenced in original posting).

Other

Applications accepted at least until March 2, 2026.
This posting is for an existing vacancy.
NVIDIA uses AI tools in its recruiting processes.