DL System Software Engineer - AI Platform

at Nvidia
📍 Toronto, Canada
CAD 116,200-201,500 per year
MIDDLE
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Go @ 6 Kubernetes @ 3 Linux @ 5 Python @ 6 Algorithms @ 3 Data Structures @ 3 Rust @ 6 CUDA @ 3 GPU @ 3

Details

We are seeking highly motivated and skilled systems engineers to join our team to help in developing an AI Platform that offers an efficient infrastructure for inference and training large scale models. As a systems engineer, you will play a crucial role in building a unified solution that brings NVIDIA technologies such as high-performance inference/training frameworks, ML compilers, performance predictors, and cluster schedulers into a single, cohesive platform.

Responsibilities

  • Contribute to the development of NVIDIA's AI platform for training, fine-tuning, and serving state-of-the-art AI models with optimal performance and efficiency.
  • Design and build solutions for scheduling large-scale AI training and inference workloads on GPU clusters across multiple cloud infrastructures.
  • Explore and develop solutions for open problems such as industry-scale resource management, GPU scheduling, performance prediction, and live workload migration.
  • Collaborate with and contribute to adjacent teams and components (e.g., TensorRT/Dynamo inference engine, ML compiler, KAI/Grove scheduler, Lepton cloud).

Requirements

  • Bachelor's degree or equivalent experience in Computer Science, Computer Engineering, or a relevant technical field.
  • 5+ years of professional experience.
  • Experience building large-scale systems from scratch; prior experience with container-based deployment systems (for example, Kubernetes) is beneficial.
  • Strong coding skills in one or more of: Python, Go, Rust, and/or C/C++.
  • Solid foundation in computer science topics: algorithms and data structures, operating systems, computer architecture.
  • Strong ability to quickly grasp new concepts and thrive in evolving situations.
  • Strong understanding of AI and related technologies is a significant plus.

Ways to Stand Out

  • Graduate-level education or relevant research/practical background.
  • Practical experience building and optimizing AI applications.
  • Proficiency with container internals and runtimes (containerd, CRI-O, Linux namespaces, CRIU).
  • Experience with NVIDIA GPU technologies such as CUDA graphs and NVIDIA driver/runtime internals.

Compensation & Benefits

  • Base salary range: 116,250 CAD - 201,500 CAD (determined based on location, experience, and internal pay parity).
  • Eligible for equity and NVIDIA benefits.

Other Details

  • Location: Toronto, Canada.
  • Employment type: Full time.
  • Applications accepted at least until September 6, 2025.