Senior AI System Engineer

at Nvidia

📍 Santa Clara, United States

USD 148,000-287,500 per year

SENIOR

✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Python @ 6 Algorithms @ 4 Data Analysis @ 6 LLM @ 4 PyTorch @ 4 CUDA @ 4 GPU @ 4

Details

At NVIDIA, we are at the forefront of advancing the capabilities of artificial intelligence. We are seeking an ambitious and forward-thinking AI/ML System Performance Engineer to contribute to next-generation inference optimizations and deliver industry-leading performance. In this role you will investigate and prototype scalable inference strategies—driving down per-token latency and maximizing system throughput by applying cross-stack optimizations that span algorithmic innovations, system-level techniques, and hardware-level enhancements.

Sample projects include Helix Parallelism and Disaggregated Inference.

Responsibilities

Optimize inference deployment by pushing the Pareto frontier of accuracy, throughput and interactivity at datacenter scale.
Develop high-fidelity performance models to prototype emerging algorithmic techniques and hardware optimizations to drive model-hardware co-design for Generative AI.
Prioritize features to guide future software and hardware roadmaps based on detailed performance modeling and analysis.
Model end-to-end performance impact of emerging GenAI workflows (for example: Agentic Pipelines, inference-time compute scaling) to understand future datacenter needs.
Collaborate across teams including deep learning research, framework development, compiler and systems engineering, and silicon architecture.
Keep up with the latest deep learning research and apply it to system and deployment-level optimizations.

Requirements

Master's degree (or equivalent experience) in Computer Science, Electrical Engineering, or related fields.
3+ years of hands-on experience in system evaluation of AI/ML workloads or performance analysis, modeling and optimizations for AI.
Strong background in computer architecture, roofline modeling, queuing theory and statistical performance analysis techniques.
Solid understanding of ML fundamentals, model parallelism and inference serving techniques.
Proficiency in Python (and optionally C++) for simulator design and data analysis.
Experience with GPU computing (CUDA).
Experience with deep learning frameworks such as PyTorch, TRT-LLM, VLLM, SGLang.
Growth mindset and pragmatic "measure, iterate, deliver" approach.

Ways to Stand Out

Comfortable defining metrics, designing experiments and visualizing large performance datasets to identify resource bottlenecks.
Proven track record of working in cross-functional teams spanning algorithms, software and hardware architecture.
Ability to distill complex analyses into clear recommendations for both technical and non-technical stakeholders.

Compensation & Benefits

Your base salary will be determined based on location, experience, and pay of employees in similar positions.
Base salary ranges provided by NVIDIA for this role:
- Level 3: 148,000 USD - 235,750 USD
- Level 4: 184,000 USD - 287,500 USD
You will also be eligible for equity and benefits (see: https://www.nvidia.com/en-us/benefits/).

Additional Information

Applications for this job will be accepted at least until September 1, 2025.
NVIDIA is an equal opportunity employer and is committed to fostering a diverse work environment.