Senior Software Engineer, AI Systems - Nemo RL

at Nvidia

📍 Toronto, Canada

CAD 116,200-247,000 per year

SENIOR

✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Software Development @ 6 Docker @ 4 Kubernetes @ 4 DevOps @ 4 Python @ 6 GCP @ 4 CI/CD @ 4 Algorithms @ 4 Distributed Systems @ 4 Machine Learning @ 7 AWS @ 4 Azure @ 4 Communication @ 4 Planning @ 4 API @ 4 LLM @ 4 PyTorch @ 4 CUDA @ 3 GPU @ 3

Details

We are seeking highly skilled and motivated software engineers to join the Nemo RL team to empower AI practitioners to develop and deploy large language models (LLMs) using reinforcement learning (RL) techniques on the Nemo RL framework. The role focuses on building high-performance, multi-node distributed training systems for large-scale RL models and productionizing them with fault tolerance and high software quality.

Responsibilities

Design and implement highly efficient distributed training systems for large-scale RL models.
Optimize parallelism strategies to improve performance and scalability across hundreds or thousands of GPUs.
Develop low-level systems components and algorithms to maximize throughput and minimize memory and compute bottlenecks.
Productionize the training systems with fault tolerance capabilities and uncompromised software quality.
Collaborate with researchers and engineers to productionize cutting-edge model architectures and training techniques.
Contribute to the design of APIs, abstractions, and UX that make it easier to scale models while maintaining usability and flexibility.
Profile, debug, and tune performance at the model, system, and hardware levels.
Participate in design discussions, code reviews, and technical planning to ensure the product aligns with business goals.
Stay up to date with the latest advancements in large-scale model training and help translate research into practical, robust systems.

Requirements

Bachelor's, Master's, or PhD in Computer Science/Engineering, Software Engineering, a related field, or equivalent experience.
3+ years of software development experience, preferably with Python and C++.
Deep understanding of machine learning pipelines and workflows, distributed systems, parallel computing, and high-performance computing principles.
Hands-on experience with large-scale training of deep learning models using frameworks such as PyTorch, Megatron Core, or DeepSpeed.
Experience optimizing compute, memory, and communication performance in large model training workflows.
Familiarity with GPU programming, CUDA, NCCL, and performance profiling tools.
Solid grasp of deep learning fundamentals, especially as they relate to reinforcement learning and training dynamics.
Ability to work closely with research and engineering teams, translating evolving needs into technical requirements and robust code.
Excellent problem-solving skills and the ability to debug complex systems.
A passion for building high-impact tools that push the boundaries of large-scale AI.

Preferred / Ways to Stand Out

Background building and optimizing LLM pre-training or post-training frameworks (DeepSpeed, torchtitan, Nanotron, verl).
Experience building and optimizing LLM inference engines (vLLM, SGLang).
Experience building ML compilers such as Triton, Torch Dynamo/Inductor.
Experience with cloud platforms (AWS, GCP, Azure), containerization (Docker), and orchestration infrastructures (Kubernetes, Slurm).
Exposure to DevOps practices, CI/CD pipelines, and infrastructure as code.

Compensation & Benefits

Base salary ranges (determined by location, experience, and peer pay):
- Level 3: 116,250 CAD - 201,500 CAD
- Level 4: 142,500 CAD - 247,000 CAD
Eligible for equity and benefits (see company benefits link).

Additional Information

Location: Canada, Toronto
Employment type: Full time
Hybrid role (#LI-Hybrid)
Applications accepted at least until August 22, 2025.