Senior Data Center Performance Engineer - Benchmarking and Optimization
at Nvidia
USD 184,000-356,500 per year
SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
Docker @ 4 Kubernetes @ 4 Linux @ 7 Python @ 4 TensorFlow @ 4 Networking @ 4 Parallel Programming @ 3 Performance Monitoring @ 4 Performance Optimization @ 4 System Architecture @ 7 PyTorch @ 4 CUDA @ 3 GPU @ 3Details
NVIDIA is expanding its data center platform ecosystem from single-node HGX/DGX systems to large multi-node NVLink domain rack architectures. These platforms combine NVIDIA GPUs, NVLink, InfiniBand networking, Grace CPUs, and an optimized AI/HPC software stack. This role leads performance benchmarking and optimization efforts to ensure data center solutions deliver industry-leading performance for AI training, inference, and HPC workloads at scale.
Responsibilities
- Design and execute comprehensive performance benchmarking strategies for data center platforms and products.
- Characterize real-world AI training, inference, and HPC workloads at scale.
- Define, track, and report key performance indicators (throughput, latency, efficiency, scaling).
- Build automation tools and frameworks for performance monitoring and analysis.
- Identify and analyze performance bottlenecks across compute, memory, network, and storage subsystems.
- Work closely with architecture, hardware, software, networking, storage, and customer teams to resolve performance issues.
- Drive performance improvements through system tuning, configuration optimization, and architectural recommendations for future systems.
Requirements
- M.S. or Ph.D. in Computer Science, Electrical Engineering or related field (or equivalent experience).
- 8+ years of experience in performance engineering or system architecture.
- Deep understanding of computer architecture, hardware-software interaction, and computing at scale.
- Strong proficiency with performance profiling tools (Linux perf, NVIDIA Nsight Systems).
- Familiarity with GPU computing and parallel programming (CUDA).
- Background with HPC networking technologies (InfiniBand, RoCE, NVLink).
- Programming skills in Python, C++, and shell scripting.
- Excellent analytical and problem-solving abilities; adaptability and passion to learn new technologies.
- Ability to communicate effectively and work with cross-functional global teams.
Ways to Stand Out
- Experience with AI/ML frameworks (PyTorch, TensorFlow, JAX).
- Knowledge of MPI, collective communications (NCCL), and distributed training/inference.
- Familiarity with NVIDIA DGX/HGX platforms and other data center solutions.
- Experience with containers, cloud provisioning and scheduling tools (Docker, Kubernetes, SLURM).
- Understanding of storage systems and I/O performance.
- Track record of performance optimization in production environments; experience with AI code generation tools.
Compensation and Benefits
- Base salary ranges by level:
- Level 4: 184,000 USD - 287,500 USD
- Level 5: 224,000 USD - 356,500 USD
- Eligible for equity and benefits.
Location & Schedule
- Location: Santa Clara, California, United States.
- Full-time role; standard full-time hours (40 hours/week by default if unspecified).
Additional Information
- Applications accepted through at least December 20, 2025.
- NVIDIA is an equal opportunity employer and commits to fostering a diverse work environment.