Senior Systems Software Engineer - Deep Learning Solutions

at Nvidia
šŸ“ Toronto, Canada
CAD 225,000-275,000 per year
SENIOR
āœ… On-site

Used Tools & Technologies

Machine Learning

Required Skills & Competences

Linux @ 4 Parallel Programming @ 4 Performance Optimization @ 4 CUDA @ 4 GPU @ 4 Deep Learning @ 4 AI @ 4 Profiling @ 4 Robotics @ 4 TensorRT @ 4

Details

NVIDIA is a global leader in physical AI powering self-driving cars, humanoid robots, intelligent environments, and medical devices. This role is a hands-on senior engineering position focused on deep learning inference optimization for autonomous vehicles and robotics on edge hardware. The work spans model-level analysis, kernel and runtime optimization, compiler interactions, and deployment on resource-constrained SOCs and GPUs.

Responsibilities

  • Engage directly with automotive OEMs, robotics partners, and internal hardware teams to analyze, debug, and optimize deep learning models on NVIDIA platforms, delivering production-ready solutions.
  • Own performance benchmarking efforts (MLPerf Edge and industry benchmarks), define methodology, ensure reproducibility, and drive actionable optimization priorities.
  • Inspect model architectures at the operator/kernel level and uncover performance bottlenecks through kernel traces and profiling.
  • Evaluate emerging model architectures (vision encoders, multi-modal VLMs, hybrid SSM-Transformer backbones, diffusion/flow matching decoders, multi-camera tokenizers) for compilation feasibility, memory footprint, and latency on target SOCs.
  • Collaborate with compiler, runtime, and hardware teams to bridge model-level insights with platform capabilities.
  • Contribute to build reviews and roadmap priorities informed by customer workload patterns.
  • Represent NVIDIA externally at conferences, webinars, and partner events; share optimization expertise and contribute guidelines and best practices.
  • Develop and deploy TensorRT and compiler-stack inference solutions for edge platforms (Jetson, DRIVE, GPU + ARM), create Proofs of Readiness (PORs), and work with compiler teams on Torch-TRT, MLIR-TRT, and related frameworks.

Requirements

  • Master's degree or equivalent experience in Computer Science, Electrical Engineering, or a related field.
  • 12+ years of industry experience with over 8 years in deep learning model optimization, inference engineering, or neural network compilation.
  • 5+ years of validated experience in embedded/edge software delivering production inference solutions in power- and latency-constrained environments.
  • Deep knowledge of modern DL architectures: transformers, attention variants, vision encoders (ViT), multi-modal/vision-language frameworks, and experience with diffusion models and/or state space models.
  • Expert knowledge of GPU architecture fundamentals, CUDA, and low-level performance optimization using heterogeneous computing.
  • Experience with TensorRT, compiler IRs, or equivalent inference optimization toolchains.
  • Solid understanding of embedded operating system internals (QNX/Linux), memory management, C/C++, and embedded/system software concepts.
  • Background in parallel programming (e.g., CUDA, OpenMP) and experience reasoning about memory hierarchies, data movement, and compute utilization.
  • Demonstrated ability to collaborate directly with external partners and customers to solve workload and performance problems within production constraints.

Preferred / Ways to Stand Out

  • Experience with ML compiler frameworks (TVM, MLIR, XLA, Triton) or contributions to inference runtime development.
  • Production deployment experience with autonomous vehicle perception or planning stacks and understanding of the full pipeline from sensors to trajectory output.
  • Familiarity with Physical AI model families (VLM + action expert architectures, end-to-end driving models, robot foundation models).
  • Contributions to MLPerf benchmarks and large-scale industry performance optimization efforts.
  • Experience with automotive safety standards (ISO 26262, SOTIF) and their implications for inference systems.
  • Experience leading technical initiatives across globally distributed teams.

Compensation & Benefits

  • Base salary range: 225,000 CAD - 275,000 CAD (determined by location, experience, and pay of employees in similar positions).
  • Eligible for equity and company benefits (link referenced in original posting).

Other

  • Applications accepted at least until March 2, 2026.
  • This posting is for an existing vacancy.
  • NVIDIA uses AI tools in its recruiting processes.