Senior AI Performance And Efficiency Engineer

at Nvidia
USD 224,000-356,500 per year
SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Go @ 6 Python @ 6 GCP @ 3 Algorithms @ 7 Machine Learning @ 7 TensorFlow @ 3 AWS @ 3 Azure @ 3 Bash @ 6 Communication @ 4 Debugging @ 4 PyTorch @ 3 CUDA @ 4 Cloud Computing @ 3 GPU @ 4

Details

We are seeking a Senior AI/ML Performance and Efficiency Engineer, GPU Clusters at NVIDIA to join our AI Efficiency efforts. As an Engineer, you will have a pivotal role in enhancing efficiency for our researchers by implementing progressions throughout the entire stack. Your main task will revolve around collaborating closely with customers to pinpoint and address infrastructure and application deficiencies, facilitating groundbreaking AI and ML research on GPU Clusters. Together, we can craft potent, effective, and scalable solutions as we mold the future of AI/ML technology!

Responsibilities

  • Collaborate closely with AI/ML researchers to make their ML models more efficient, leading to significant productivity improvements and cost savings.
  • Build tools, frameworks, and apply ML techniques to detect and analyze efficiency bottlenecks and deliver productivity improvements for researchers.
  • Work with researchers on a variety of ML workloads across robotics, autonomous vehicles, large language models (LLMs), video workloads, and more.
  • Collaborate across engineering organizations to deliver efficiency in usage of hardware, software, and infrastructure.
  • Proactively monitor fleet-wide utilization patterns, analyze inefficiency patterns or discover new patterns, and deliver scalable solutions to resolve them.
  • Keep up to date with recent developments in AI/ML technologies and frameworks and advocate for their integration within the organization.

Requirements

  • BS or similar background in Computer Science or a related area (or equivalent experience).
  • Minimum 8+ years of experience designing and operating large-scale compute infrastructure.
  • Strong understanding of modern ML techniques and tools; experience with machine learning and deep learning concepts, algorithms, and models.
  • Experience investigating and resolving training and inference performance end-to-end.
  • Debugging and optimization experience with NSight Systems and NSight Compute.
  • Experience debugging large-scale distributed training using NCCL.
  • Proficiency in programming and scripting languages such as Python, Go, and Bash.
  • Familiarity with cloud computing platforms (e.g., AWS, GCP, Azure).
  • Experience with parallel computing frameworks and paradigms.
  • Dedication to ongoing learning and staying updated on new technologies and methods in AI/ML infrastructure.
  • Excellent communication and collaboration skills, with the ability to work effectively with diverse teams.

Ways to Stand Out / Nice-to-Have

  • Background with NVIDIA GPUs and CUDA programming.
  • Experience with MLPerf benchmarking.
  • Familiarity with InfiniBand (IBOP) and RDMA.
  • Understanding of fast, distributed storage systems like Lustre and GPFS for AI/HPC workloads.
  • Familiarity with deep learning frameworks such as PyTorch and TensorFlow.

Compensation & Other Details

  • Base salary ranges (determined by location, experience, and similar roles):
    • Level 4: 184,000 USD - 287,500 USD
    • Level 5: 224,000 USD - 356,500 USD
  • Eligible for equity and benefits.
  • Applications accepted at least until November 29, 2025.
  • Location: US, CA, Santa Clara.