Senior Deep Learning Engineer – Autonomous Vehicles

at Nvidia

📍 Santa Clara, United States

USD 224,000-356,500 per year

SENIOR

✅ On-site

Used Tools & Technologies

Machine Learning

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Kubernetes @ 7 Python @ 6 Distributed Systems @ 4 Leadership @ 4 Networking @ 7 Performance Monitoring @ 4 Experimentation @ 4 PyTorch @ 4 GPU @ 4 Deep Learning @ 4 Observability @ 4 AI @ 4 InfiniBand @ 7 Reinforcement Learning @ 3 Profiling @ 4 NCCL @ 4 Slurm @ 7 HPC @ 8

Details

NVIDIA is seeking a Senior Deep Learning Systems Engineer to advance the Autonomous Vehicles project by building and scaling training libraries and infrastructure for end-to-end autonomous driving models. The role focuses on enabling training on multi-thousand GPU clusters and improving iteration speed, safety, and developer productivity through robust, high-performance infrastructure.

Responsibilities

Craft, scale, and harden deep learning infrastructure libraries and frameworks for training on multi-thousand GPU clusters.
Improve efficiency across the training stack: data loaders, distributed training, scheduling, and performance monitoring.
Build robust training pipelines and libraries to handle massive video datasets and enable rapid experimentation.
Collaborate with researchers, model engineers, and internal platform teams to enhance efficiency, minimize stalls, and improve training availability.
Own core infrastructure components such as orchestration libraries, distributed training frameworks, and fault-resilient training systems.
Partner with leadership to ensure infrastructure scales with growing GPU capacity and dataset size while maintaining developer efficiency and stability.

Requirements

BS, MS, or PhD in Computer Science, Electrical/Computer Engineering, or a related field, or equivalent experience.
12+ years of professional experience building and scaling high-performance distributed systems, ideally in ML, HPC, or large-scale data infrastructure.
Extensive knowledge of deep learning frameworks (PyTorch preferred) and large-scale training (DDP/FSDP, NCCL, tensor and pipeline parallelism).
Strong systems background including datacenter networking (RoCE, InfiniBand), parallel filesystems (Lustre), storage systems, and schedulers (Slurm, Kubernetes).
Proficiency in Python and C++, with experience writing production-grade libraries, orchestration layers, and automation tools.
Experience with performance profiling and optimizing large-scale training workflows.
Ability to work closely with cross-functional teams (ML researchers, infra engineers, product leads) and translate requirements into robust systems.

Ways to stand out

Experience scaling large GPU training clusters with >1,000 GPUs.
Contributions to open-source ML systems libraries (e.g., PyTorch, NCCL, FSDP, schedulers, storage clients).
Expertise in fault resilience and high availability, including elastic training and large-scale observability.
Hands-on leadership experience as a technical authority for ML systems engineering.
Familiarity with reinforcement learning at scale, particularly for simulation-heavy workloads.

Compensation & Benefits

Base salary range: 224,000 USD - 356,500 USD (base determined by location, experience, and pay of employees in similar positions).
Eligible for equity and benefits (see NVIDIA benefits page).

Other details

Applications accepted at least until July 3, 2026. This posting is for an existing vacancy.
NVIDIA uses AI tools in its recruiting processes.
NVIDIA is an equal opportunity employer and committed to fostering an inclusive work environment.