Senior Performance Compiler Engineer - Triton

at Nvidia

📍 Redmond, United States

USD 184,000-287,500 per year

SENIOR

✅ On-site

Used Tools & Technologies

LLM HPC

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Software Development @ 7 Python @ 4 Algorithms @ 4 Machine Learning @ 4 Parallel Programming @ 4 Debugging @ 7 CUDA @ 6 GPU @ 4 Deep Learning @ 4 AI @ 4 OpenCL @ 4 Performance Analysis @ 7

Details

NVIDIA's invention of the GPU in 1999 redefined modern computer graphics and parallel computing and more recently fueled modern AI. We are looking for a Senior Performance Compiler Engineer to join the team working on the open-source Triton compiler project. This role focuses on using compilers and low-level optimization to improve AI performance on NVIDIA GPUs, accelerating training and inference for large language models, agents, and other AI applications.

Responsibilities

Investigate the latest and future NVIDIA GPU hardware architecture and programming models.
Work on the frontier of AI by understanding advanced algorithms (like attention sinks and MoEs) and numerics (like block-scaled floating point) to identify new optimization opportunities.
Design and implement compiler technology using MLIR to optimize high-level kernel descriptions written in Triton's Python DSL, focusing on generating efficient low-level GPU code.
When necessary, use inline PTX to hand-tune critical code paths and extract peak hardware performance.
Engage in an iterative optimization process (starting from kernels or compiler) to find the most efficient path to peak performance.
Collaborate with teams across NVIDIA, including hardware architects and the CUDA compiler team, to influence future products and ensure maximum efficiency.

Requirements

Bachelor, Master or Ph.D. degree (or equivalent experience) in Computer Science, Computer Engineering, Applied Math, or a related field.
8+ years of relevant industry experience in software development.
Strong C++ programming and software design skills, with emphasis on performance analysis and debugging.
Experience in parallel programming, including CUDA/OpenCL GPU programming or other parallel models such as OpenMP.
Solid understanding of computer architecture and hands-on experience with assembly-level programming.

Ways to stand out from the crowd

Experience tuning BLAS or deep learning library kernels.
Background in numerics and linear algebra.
Experience with machine learning compilers like TVM or MLIR.
Contributions to open-source projects, especially in AI/ML, compilers, or high-performance computing.
Familiarity with the latest research in AI algorithms and numerics and a track record of open-source contributions in relevant domains.

Compensation and benefits

Base salary range: 184,000 USD - 287,500 USD.
You will also be eligible for equity and benefits (see NVIDIA benefits page).

Other details

Applications for this job will be accepted at least until May 12, 2026.
This posting is for an existing vacancy.
NVIDIA uses AI tools in its recruiting processes.
NVIDIA is an equal opportunity employer and is committed to a diverse work environment.