Senior AI System Engineer

at Nvidia
USD 148,000-287,500 per year
SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Python @ 6 Algorithms @ 4 Data Analysis @ 6 LLM @ 4 PyTorch @ 4 CUDA @ 4 GPU @ 4

Details

At NVIDIA, we are at the forefront of advancing the capabilities of artificial intelligence. We are seeking an ambitious and forward-thinking AI/ML System Performance Engineer to contribute to next-generation inference optimizations and deliver industry-leading performance. In this role you will investigate and prototype scalable inference strategies—driving down per-token latency and maximizing system throughput by applying cross-stack optimizations that span algorithmic innovations, system-level techniques, and hardware-level enhancements.

Sample projects include Helix Parallelism and Disaggregated Inference.

Responsibilities

  • Optimize inference deployment by pushing the Pareto frontier of accuracy, throughput and interactivity at datacenter scale.
  • Develop high-fidelity performance models to prototype emerging algorithmic techniques and hardware optimizations to drive model-hardware co-design for Generative AI.
  • Prioritize features to guide future software and hardware roadmaps based on detailed performance modeling and analysis.
  • Model end-to-end performance impact of emerging GenAI workflows (for example: Agentic Pipelines, inference-time compute scaling) to understand future datacenter needs.
  • Collaborate across teams including deep learning research, framework development, compiler and systems engineering, and silicon architecture.
  • Keep up with the latest deep learning research and apply it to system and deployment-level optimizations.

Requirements

  • Master's degree (or equivalent experience) in Computer Science, Electrical Engineering, or related fields.
  • 3+ years of hands-on experience in system evaluation of AI/ML workloads or performance analysis, modeling and optimizations for AI.
  • Strong background in computer architecture, roofline modeling, queuing theory and statistical performance analysis techniques.
  • Solid understanding of ML fundamentals, model parallelism and inference serving techniques.
  • Proficiency in Python (and optionally C++) for simulator design and data analysis.
  • Experience with GPU computing (CUDA).
  • Experience with deep learning frameworks such as PyTorch, TRT-LLM, VLLM, SGLang.
  • Growth mindset and pragmatic "measure, iterate, deliver" approach.

Ways to Stand Out

  • Comfortable defining metrics, designing experiments and visualizing large performance datasets to identify resource bottlenecks.
  • Proven track record of working in cross-functional teams spanning algorithms, software and hardware architecture.
  • Ability to distill complex analyses into clear recommendations for both technical and non-technical stakeholders.

Compensation & Benefits

  • Your base salary will be determined based on location, experience, and pay of employees in similar positions.
  • Base salary ranges provided by NVIDIA for this role:
    • Level 3: 148,000 USD - 235,750 USD
    • Level 4: 184,000 USD - 287,500 USD
  • You will also be eligible for equity and benefits (see: https://www.nvidia.com/en-us/benefits/).

Additional Information

  • Applications for this job will be accepted at least until September 1, 2025.
  • NVIDIA is an equal opportunity employer and is committed to fostering a diverse work environment.