Senior AI Performance And Efficiency Engineer

at Nvidia
USD 152,000-287,500 per year
SENIOR
✅ On-site

Used Tools & Technologies

Machine Learning LLM

Required Skills & Competences

Go @ 6 Python @ 6 GCP @ 3 TensorFlow @ 3 AWS @ 3 Azure @ 3 Bash @ 6 Communication @ 4 Debugging @ 4 PyTorch @ 3 CUDA @ 4 Cloud Computing @ 3 GPU @ 4 Deep Learning @ 3 AI @ 4 InfiniBand @ 3 Robotics @ 4 NCCL @ 4 HPC @ 4

Details

We are seeking a Senior AI/ML Performance and Efficiency Engineer, GPU Clusters at NVIDIA to join our AI Efficiency efforts. As an engineer you will play a pivotal role in enhancing efficiency for researchers by implementing improvements across the entire stack, collaborating with customers to identify and address infrastructure and application inefficiencies to enable scalable AI/ML research on GPU clusters.

Responsibilities

  • Collaborate closely with AI/ML researchers to make ML models more efficient, delivering productivity improvements and cost savings.
  • Build tools, frameworks, and apply ML techniques to detect and analyze efficiency bottlenecks and deliver productivity improvements for researchers.
  • Work with researchers on a variety of ML workloads across robotics, autonomous vehicles, large language models (LLMs), video, and more.
  • Collaborate across engineering organizations to deliver efficiency in hardware, software, and infrastructure usage.
  • Proactively monitor fleet-wide utilization patterns, analyze existing inefficiency patterns or discover new ones, and deliver scalable solutions.
  • Keep up to date with recent developments in AI/ML technologies, frameworks, and successful strategies and advocate for their integration.

Requirements

  • BS or equivalent background in Computer Science or related area (or equivalent experience).
  • Minimum 5+ years of experience designing and operating large-scale compute infrastructure.
  • Strong understanding of modern ML techniques and tools.
  • Experience investigating and resolving training and inference performance end-to-end.
  • Debugging and optimization experience with NSight Systems and NSight Compute.
  • Experience debugging large-scale distributed training using NCCL.
  • Proficiency in programming and scripting languages such as Python, Go, and Bash.
  • Familiarity with cloud computing platforms (e.g., AWS, GCP, Azure).
  • Experience with parallel computing frameworks and paradigms.
  • Dedication to ongoing learning and staying updated on AI/ML infrastructure technologies.
  • Excellent communication and collaboration skills.

Ways to stand out / Preferred

  • Background with NVIDIA GPUs and CUDA programming.
  • Experience with NCCL and MLPerf benchmarking.
  • Familiarity with InfiniBand (IBOP) and RDMA.
  • Understanding of fast, distributed storage systems like Lustre and GPFS for AI/HPC workloads.
  • Familiarity with deep learning frameworks such as PyTorch and TensorFlow.

Compensation & Benefits

  • Base salary ranges (location, level, and experience dependent):
    • Level 3: 152,000 USD - 241,500 USD
    • Level 4: 184,000 USD - 287,500 USD
  • Eligible for equity and a comprehensive benefits package. Link to benefits: https://www.nvidia.com/en-us/benefits/

Other

  • Applications accepted at least until March 23, 2026.
  • NVIDIA uses AI tools in its recruiting processes.
  • NVIDIA is an equal opportunity employer and committed to fostering a diverse work environment.