Senior Datacenter Performance Model Engineer

at Nvidia
USD 152,000-287,500 per year
SENIOR
✅ On-site

Used Tools & Technologies

Not specified

Required Skills & Competences

Software Development @ 6 Kubernetes @ 4 Linux @ 4 Python @ 7 TensorFlow @ 4 Communication @ 7 Networking @ 4 Debugging @ 7 PyTorch @ 4 CUDA @ 4 GPU @ 4 Deep Learning @ 4 AI @ 4 Slurm @ 4 Performance Analysis @ 4

Details

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. Today, we are tapping into the unlimited potential of AI to define the next era of computing. This software engineering role involves developing datacenter-scale performance modeling and prediction tools for AI researchers running AI workloads in GPU clusters.

Responsibilities

  • Build performance modeling and prediction tools for AI workloads at datacenter scale.
  • Develop production tools and workflows used by multiple teams within NVIDIA and its customers.
  • Automate workflows including search for the most efficient configurations over millions of parameters.
  • Partner with hardware and software architects to propose new features or improve existing features with real-world use cases.

Requirements

  • BS+ in Computer Science or related field (or equivalent experience) and 5+ years of software development experience.
  • Strong software skills in design and coding (C++ and Python), analytical thinking, and debugging.
  • Good understanding of deep learning frameworks such as PyTorch and TensorFlow, and distributed training and inference.
  • Knowledge of GPU cluster job scheduling (Slurm or Kubernetes), storage, and networking.
  • Experience with NVIDIA GPUs and CUDA programming.
  • Motivated self-starter with strong problem-solving skills and customer-facing communication skills.
  • Ability to work concurrently with multiple global groups and a passion for continuous learning.

Ways to stand out

  • Proven software engineering experience deploying software at datacenter scale.
  • Solid experience in large AI job performance analysis for training and inference workloads.
  • Knowledge of Linux device drivers and/or compiler implementation.
  • Knowledge of GPU and/or CPU architecture and general computer architecture principles.

Compensation and benefits

  • Base salary ranges: 152,000 USD - 241,500 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4.
  • Eligible for equity and benefits.

Other information

  • Applications accepted at least until June 1, 2026.
  • This posting is for an existing vacancy.
  • NVIDIA uses AI tools in its recruiting processes.
  • NVIDIA is an equal opportunity employer committed to fostering an inclusive work environment.