Senior Deep Learning Systems Engineer, Datacenters

at Nvidia

📍 Santa Clara, United States

USD 184,000-356,500 per year

SENIOR

✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Software Development @ 4 Docker @ 4 Linux @ 4 Python @ 4 TensorFlow @ 4 Bash @ 4 Networking @ 4 Performance Monitoring @ 4 System Architecture @ 7 PyTorch @ 4 CUDA @ 4 GPU @ 4

Details

As NVIDIA expands its datacenter business, this role focuses on extracting maximum performance and efficiency from datacenter-class hardware and establishing data-driven approaches to hardware design and system software development. The Deep Learning Systems Engineer will analyze performance and power consumption of deep learning applications on datacenter systems and influence the design and optimization of next-generation datacenters and the Deep Learning software stack.

Responsibilities

Develop software infrastructure to characterize and analyze a broad range of Deep Learning applications.
Evolve cost-efficient datacenter architectures tailored to the needs of Large Language Models (LLMs).
Develop analysis and profiling tools in Python, bash and C++ to measure key performance metrics of DL workloads running on NVIDIA systems.
Analyze system and software characteristics of DL applications (CPU, GPU, networking, IO interactions with DL architectures).
Develop methodologies and tools to measure key performance metrics and estimate potential for efficiency improvements.

Requirements

Bachelor’s degree in Electrical Engineering, Computer Science, or equivalent experience (Master’s or PhD preferred).
8 years or more of relevant experience.
Experience in at least one of the following areas:
- System software: Operating Systems (Linux), compilers, GPU kernels (CUDA), DL frameworks (PyTorch, TensorFlow).
- Silicon architecture and performance modeling/analysis: CPU, GPU, memory or network architecture.
Programming experience in C/C++ and Python.
Exposure to containerization platforms (docker) and datacenter workload managers (slurm) is a plus.
Deep understanding of computer system architecture and performance analysis; demonstrated hands-on experience in these domains.
Demonstrated ability to work in virtual/multi-site environments and to own tasks end-to-end.

Preferred / Ways to stand out

Background with system software, OS intrinsics, GPU kernels (CUDA), or DL frameworks (PyTorch, TensorFlow).
Experience with silicon performance monitoring or profiling tools (e.g., perf, gprof, nvidia-smi, dcgm).
In-depth performance modeling experience in CPU, GPU, memory or network architecture.
Exposure to containerization platforms (docker) and workload managers (slurm).
Prior experience working with multi-site or cross-functional teams.

Benefits & Additional Details

Base salary ranges by level:
- Level 4: 184,000 USD - 287,500 USD
- Level 5: 224,000 USD - 356,500 USD
Eligible for equity and additional benefits (link provided in original posting).
NVIDIA is an equal opportunity employer committed to diversity and inclusion.
Application acceptance at least until September 7, 2025.
#LI-Hybrid