Used Tools & Technologies
Not specified
Required Skills & Competences ?
Python @ 6 Algorithms @ 4 Data Analysis @ 6 LLM @ 4 PyTorch @ 4 CUDA @ 4 GPU @ 4Details
At NVIDIA, we are at the forefront of advancing the capabilities of artificial intelligence. We are seeking an ambitious and forward-thinking AI/ML System Performance Engineer to contribute to next-generation inference optimizations and deliver industry-leading performance. In this role you will investigate and prototype scalable inference strategies—driving down per-token latency and maximizing system throughput by applying cross-stack optimizations that span algorithmic innovations, system-level techniques, and hardware-level enhancements.
Sample projects include Helix Parallelism and Disaggregated Inference.
Responsibilities
- Optimize inference deployment by pushing the Pareto frontier of accuracy, throughput and interactivity at datacenter scale.
- Develop high-fidelity performance models to prototype emerging algorithmic techniques and hardware optimizations to drive model-hardware co-design for Generative AI.
- Prioritize features to guide future software and hardware roadmaps based on detailed performance modeling and analysis.
- Model end-to-end performance impact of emerging GenAI workflows (for example: Agentic Pipelines, inference-time compute scaling) to understand future datacenter needs.
- Collaborate across teams including deep learning research, framework development, compiler and systems engineering, and silicon architecture.
- Keep up with the latest deep learning research and apply it to system and deployment-level optimizations.
Requirements
- Master's degree (or equivalent experience) in Computer Science, Electrical Engineering, or related fields.
- 3+ years of hands-on experience in system evaluation of AI/ML workloads or performance analysis, modeling and optimizations for AI.
- Strong background in computer architecture, roofline modeling, queuing theory and statistical performance analysis techniques.
- Solid understanding of ML fundamentals, model parallelism and inference serving techniques.
- Proficiency in Python (and optionally C++) for simulator design and data analysis.
- Experience with GPU computing (CUDA).
- Experience with deep learning frameworks such as PyTorch, TRT-LLM, VLLM, SGLang.
- Growth mindset and pragmatic "measure, iterate, deliver" approach.
Ways to Stand Out
- Comfortable defining metrics, designing experiments and visualizing large performance datasets to identify resource bottlenecks.
- Proven track record of working in cross-functional teams spanning algorithms, software and hardware architecture.
- Ability to distill complex analyses into clear recommendations for both technical and non-technical stakeholders.
Compensation & Benefits
- Your base salary will be determined based on location, experience, and pay of employees in similar positions.
- Base salary ranges provided by NVIDIA for this role:
- Level 3: 148,000 USD - 235,750 USD
- Level 4: 184,000 USD - 287,500 USD
- You will also be eligible for equity and benefits (see: https://www.nvidia.com/en-us/benefits/).
Additional Information
- Applications for this job will be accepted at least until September 1, 2025.
- NVIDIA is an equal opportunity employer and is committed to fostering a diverse work environment.