Senior Data Center Performance Engineer - Benchmarking and Optimization

at Nvidia

📍 Santa Clara, United States

USD 184,000-356,500 per year

SENIOR

✅ On-site

Used Tools & Technologies

Not specified

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Docker @ 4 Kubernetes @ 4 Linux @ 7 Python @ 4 TensorFlow @ 4 Networking @ 4 Parallel Programming @ 3 Performance Monitoring @ 4 Performance Optimization @ 4 System Architecture @ 7 PyTorch @ 4 CUDA @ 3 GPU @ 3

Details

NVIDIA is expanding its data center platform ecosystem from single-node HGX/DGX systems to large multi-node NVLink domain rack architectures. These platforms combine NVIDIA GPUs, NVLink, InfiniBand networking, Grace CPUs, and an optimized AI/HPC software stack. This role leads performance benchmarking and optimization efforts to ensure data center solutions deliver industry-leading performance for AI training, inference, and HPC workloads at scale.

Responsibilities

Design and execute comprehensive performance benchmarking strategies for data center platforms and products.
Characterize real-world AI training, inference, and HPC workloads at scale.
Define, track, and report key performance indicators (throughput, latency, efficiency, scaling).
Build automation tools and frameworks for performance monitoring and analysis.
Identify and analyze performance bottlenecks across compute, memory, network, and storage subsystems.
Work closely with architecture, hardware, software, networking, storage, and customer teams to resolve performance issues.
Drive performance improvements through system tuning, configuration optimization, and architectural recommendations for future systems.

Requirements

M.S. or Ph.D. in Computer Science, Electrical Engineering or related field (or equivalent experience).
8+ years of experience in performance engineering or system architecture.
Deep understanding of computer architecture, hardware-software interaction, and computing at scale.
Strong proficiency with performance profiling tools (Linux perf, NVIDIA Nsight Systems).
Familiarity with GPU computing and parallel programming (CUDA).
Background with HPC networking technologies (InfiniBand, RoCE, NVLink).
Programming skills in Python, C++, and shell scripting.
Excellent analytical and problem-solving abilities; adaptability and passion to learn new technologies.
Ability to communicate effectively and work with cross-functional global teams.

Ways to Stand Out

Experience with AI/ML frameworks (PyTorch, TensorFlow, JAX).
Knowledge of MPI, collective communications (NCCL), and distributed training/inference.
Familiarity with NVIDIA DGX/HGX platforms and other data center solutions.
Experience with containers, cloud provisioning and scheduling tools (Docker, Kubernetes, SLURM).
Understanding of storage systems and I/O performance.
Track record of performance optimization in production environments; experience with AI code generation tools.

Compensation and Benefits

Base salary ranges by level:
- Level 4: 184,000 USD - 287,500 USD
- Level 5: 224,000 USD - 356,500 USD
Eligible for equity and benefits.

Location & Schedule

Location: Santa Clara, California, United States.
Full-time role; standard full-time hours (40 hours/week by default if unspecified).

Additional Information

Applications accepted through at least December 20, 2025.
NVIDIA is an equal opportunity employer and commits to fostering a diverse work environment.