Senior AI Performance And Efficiency Engineer

at Nvidia

📍 Santa Clara, United States

USD 152,000-287,500 per year

SENIOR

✅ On-site

Used Tools & Technologies

Machine Learning LLM

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Go @ 6 Python @ 6 GCP @ 3 TensorFlow @ 3 AWS @ 3 Azure @ 3 Bash @ 6 Communication @ 4 Debugging @ 4 PyTorch @ 3 CUDA @ 4 Cloud Computing @ 3 GPU @ 4 Deep Learning @ 3 AI @ 4 InfiniBand @ 3 Robotics @ 4 NCCL @ 4 HPC @ 4

Details

We are seeking a Senior AI/ML Performance and Efficiency Engineer, GPU Clusters at NVIDIA to join our AI Efficiency efforts. As an engineer you will play a pivotal role in enhancing efficiency for researchers by implementing improvements across the entire stack, collaborating with customers to identify and address infrastructure and application inefficiencies to enable scalable AI/ML research on GPU clusters.

Responsibilities

Collaborate closely with AI/ML researchers to make ML models more efficient, delivering productivity improvements and cost savings.
Build tools, frameworks, and apply ML techniques to detect and analyze efficiency bottlenecks and deliver productivity improvements for researchers.
Work with researchers on a variety of ML workloads across robotics, autonomous vehicles, large language models (LLMs), video, and more.
Collaborate across engineering organizations to deliver efficiency in hardware, software, and infrastructure usage.
Proactively monitor fleet-wide utilization patterns, analyze existing inefficiency patterns or discover new ones, and deliver scalable solutions.
Keep up to date with recent developments in AI/ML technologies, frameworks, and successful strategies and advocate for their integration.

Requirements

BS or equivalent background in Computer Science or related area (or equivalent experience).
Minimum 5+ years of experience designing and operating large-scale compute infrastructure.
Strong understanding of modern ML techniques and tools.
Experience investigating and resolving training and inference performance end-to-end.
Debugging and optimization experience with NSight Systems and NSight Compute.
Experience debugging large-scale distributed training using NCCL.
Proficiency in programming and scripting languages such as Python, Go, and Bash.
Familiarity with cloud computing platforms (e.g., AWS, GCP, Azure).
Experience with parallel computing frameworks and paradigms.
Dedication to ongoing learning and staying updated on AI/ML infrastructure technologies.
Excellent communication and collaboration skills.

Ways to stand out / Preferred

Background with NVIDIA GPUs and CUDA programming.
Experience with NCCL and MLPerf benchmarking.
Familiarity with InfiniBand (IBOP) and RDMA.
Understanding of fast, distributed storage systems like Lustre and GPFS for AI/HPC workloads.
Familiarity with deep learning frameworks such as PyTorch and TensorFlow.

Compensation & Benefits

Base salary ranges (location, level, and experience dependent):
- Level 3: 152,000 USD - 241,500 USD
- Level 4: 184,000 USD - 287,500 USD
Eligible for equity and a comprehensive benefits package. Link to benefits: https://www.nvidia.com/en-us/benefits/

Other

Applications accepted at least until March 23, 2026.
NVIDIA uses AI tools in its recruiting processes.
NVIDIA is an equal opportunity employer and committed to fostering a diverse work environment.