DL System Software Engineer - AI Platform

at Nvidia
πŸ“ Toronto, Canada
CAD 116,200-201,500 per year
MIDDLE
βœ… On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Go @ 6 Kubernetes @ 3 Linux @ 5 Python @ 6 Algorithms @ 3 Data Structures @ 3 Rust @ 6 CUDA @ 3 GPU @ 3

Details

We are seeking highly motivated and skilled systems engineers to join a team developing an AI platform that provides efficient infrastructure for inference and training of large-scale models. The role focuses on building a unified solution that integrates NVIDIA technologies (high-performance inference/training frameworks, ML compilers, performance predictors, and cluster schedulers) into a cohesive platform.

Responsibilities

  • Participate in development of an AI platform for training, fine-tuning, and serving state-of-the-art AI models with optimal performance and efficiency.
  • Design and build solutions for scheduling large-scale AI training and inference workloads on GPU clusters across multiple cloud infrastructures.
  • Explore and find solutions to open problems such as industry-scale resource management, GPU scheduling, performance prediction, and live workload migration.
  • Collaborate with and contribute to adjacent teams and components, including TensorRT/Dynamo inference engine, ML compilers, KAI/Grove scheduler, and Lepton cloud.

Requirements

  • Bachelor's degree or equivalent experience in Computer Science, Computer Engineering, or a relevant technical field.
  • 5+ years of experience.
  • Experience building large-scale systems from scratch; prior experience with container-based deployment systems (e.g., Kubernetes) is beneficial.
  • Strong coding skills in one or more of: Python, Go, Rust, and/or C/C++.
  • Solid foundation in algorithms and data structures, operating systems, and computer architecture.
  • Strong understanding of AI and related technologies is a plus.
  • Ability to quickly grasp new concepts and thrive in evolving situations.

Preferred / Ways to stand out

  • Graduate-level education or relevant practical research background.
  • Practical experience building and optimizing AI applications.
  • Proficiency with container software such as containerd, CRI-O, Linux namespaces, CRIU.
  • Experience with NVIDIA GPU technologies such as CUDA graphs and driver/runtime internals.

Benefits

  • Base salary range: 116,250 CAD - 201,500 CAD (determined based on location, experience, and comparable roles).
  • Eligible for equity and additional benefits (see company benefits page).
  • Applications for this job will be accepted at least until September 6, 2025.