Senior Systems Performance Engineer

at Nvidia

📍 Santa Clara, United States

USD 136,000-258,800 per year

SENIOR

✅ On-site

Used Tools & Technologies

Machine Learning

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Security @ 4 Python @ 4 Networking @ 4 Debugging @ 6 System Architecture @ 4 LLM @ 4 CUDA @ 6 GPU @ 4 Deep Learning @ 4 AI @ 4 vLLM @ 4 Slurm @ 4 TensorRT @ 4

Details

NVIDIA is seeking a Senior Validation Engineer on the DGX Server Product Engineering Team to work with HW/SW engineers to develop and implement complex automated test plans for GPU-accelerated computing products. The role focuses on system architecture, performance modeling, GPU SKU bring up, validation, and using industry-leading Deep Learning/AI applications for system-level stress and performance testing. The position requires on-site work in a hardware lab environment 5 days a week in Santa Clara, CA.

Responsibilities

System architecture, design, performance modelling and estimation across new models and new packages.
Enable GPU SKU bring up, validation and model enablement.
Develop system-level stress and performance testing strategies using industry-leading Deep Learning/AI applications.
Work with HW/SW engineers to develop and implement complex automated test plans for GPU-accelerated computing products.

Requirements

Ability to work on site in hardware lab environment 5 days a week (Santa Clara, CA).
BSEE or BSCE or equivalent experience.
5+ years of experience validating and debugging complex systems.
Experience developing/running real-world ML/LLM workloads.
Mandatory skills: Dynamo, TensorRT, Slurm, BCM.
Preferred: Knowledge of vLLM and SG Lang.
Proficiency in CUDA, cuBLAS and CUTLASS.
Deep understanding of computing architectures.
Coding experience with Python and running simulators.
Experience with datacenter products including system management, security, networking, and storage.

Ways to stand out

Background with x86/Arm server architectures and accelerated GPU computing.
Track record of continuous process improvement with a passion for tools and automation.

Compensation & Benefits

Base salary range (determined by location, experience, and comparable employees):
- Level 3: 136,000 USD - 212,750 USD
- Level 4: 168,000 USD - 258,750 USD
Eligible for equity and benefits. See www.nvidiabenefits.com for details.

Other

Applications accepted at least until April 5, 2026.
This posting is for an existing vacancy.
NVIDIA uses AI tools in its recruiting processes and is an equal opportunity employer.