Senior Data Center Performance Engineer - Benchmarking and Optimization

at Nvidia
USD 184,000-356,500 per year
SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Docker @ 4 Kubernetes @ 4 Linux @ 7 Python @ 4 TensorFlow @ 4 Networking @ 4 Parallel Programming @ 3 Performance Monitoring @ 4 Performance Optimization @ 4 System Architecture @ 7 PyTorch @ 4 CUDA @ 3 GPU @ 3

Details

NVIDIA is expanding its data center platform ecosystem from single-node HGX/DGX systems to large multi-node NVLink domain rack architectures. These platforms combine NVIDIA GPUs, NVLink, InfiniBand networking, Grace CPUs, and an optimized AI/HPC software stack. This role leads performance benchmarking and optimization efforts to ensure data center solutions deliver industry-leading performance for AI training, inference, and HPC workloads at scale.

Responsibilities

  • Design and execute comprehensive performance benchmarking strategies for data center platforms and products.
  • Characterize real-world AI training, inference, and HPC workloads at scale.
  • Define, track, and report key performance indicators (throughput, latency, efficiency, scaling).
  • Build automation tools and frameworks for performance monitoring and analysis.
  • Identify and analyze performance bottlenecks across compute, memory, network, and storage subsystems.
  • Work closely with architecture, hardware, software, networking, storage, and customer teams to resolve performance issues.
  • Drive performance improvements through system tuning, configuration optimization, and architectural recommendations for future systems.

Requirements

  • M.S. or Ph.D. in Computer Science, Electrical Engineering or related field (or equivalent experience).
  • 8+ years of experience in performance engineering or system architecture.
  • Deep understanding of computer architecture, hardware-software interaction, and computing at scale.
  • Strong proficiency with performance profiling tools (Linux perf, NVIDIA Nsight Systems).
  • Familiarity with GPU computing and parallel programming (CUDA).
  • Background with HPC networking technologies (InfiniBand, RoCE, NVLink).
  • Programming skills in Python, C++, and shell scripting.
  • Excellent analytical and problem-solving abilities; adaptability and passion to learn new technologies.
  • Ability to communicate effectively and work with cross-functional global teams.

Ways to Stand Out

  • Experience with AI/ML frameworks (PyTorch, TensorFlow, JAX).
  • Knowledge of MPI, collective communications (NCCL), and distributed training/inference.
  • Familiarity with NVIDIA DGX/HGX platforms and other data center solutions.
  • Experience with containers, cloud provisioning and scheduling tools (Docker, Kubernetes, SLURM).
  • Understanding of storage systems and I/O performance.
  • Track record of performance optimization in production environments; experience with AI code generation tools.

Compensation and Benefits

  • Base salary ranges by level:
    • Level 4: 184,000 USD - 287,500 USD
    • Level 5: 224,000 USD - 356,500 USD
  • Eligible for equity and benefits.

Location & Schedule

  • Location: Santa Clara, California, United States.
  • Full-time role; standard full-time hours (40 hours/week by default if unspecified).

Additional Information

  • Applications accepted through at least December 20, 2025.
  • NVIDIA is an equal opportunity employer and commits to fostering a diverse work environment.