Senior Performance Compiler Engineer - Triton

at Nvidia

📍 Santa Clara, United States

USD 184,000-356,500 per year

SENIOR

✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Software Development @ 7 Python @ 4 Algorithms @ 4 Machine Learning @ 4 Parallel Programming @ 4 Debugging @ 7 CUDA @ 6 GPU @ 4

Details

We are looking for a Senior Performance Compiler Engineer to join the team working on the open-source Triton compiler project. This role focuses on using compilers and low-level GPU programming to improve AI performance on NVIDIA GPUs, accelerating training and inference for large language models, agents, and other high-impact AI applications. The position is Full time and based in Santa Clara, CA (NVIDIA).

Responsibilities

Investigate the latest and future NVIDIA GPU hardware architecture and programming models.
Work on the frontier of AI by understanding advanced algorithms (e.g., attention sinks, Mixture of Experts) and numerics (e.g., block-scaled floating point) to identify new optimization opportunities.
Design and implement compiler technology using MLIR to optimize high-level kernel descriptions (written in Triton's Python DSL) and generate efficient low-level GPU code.
When necessary, use inline PTX to hand-tune critical code paths and extract peak hardware performance.
Engage in an iterative optimization process (kernel-first or compiler-first) to reach peak performance.
Collaborate with NVIDIA teams, including hardware architects and the CUDA compiler team, to influence future products and ensure high efficiency.

Requirements

Bachelor, Master or Ph.D. degree (or equivalent experience) in Computer Science, Computer Engineering, Applied Math, or a related field.
6+ years of relevant industry experience in software development.
Demonstrated strong C++ programming and software design skills, with emphasis on performance analysis and debugging.
Experience in parallel programming, including CUDA/OpenCL GPU programming or other parallel models such as OpenMP.
Solid understanding of computer architecture and hands-on experience with assembly-level programming.
Familiarity with MLIR and compiler technology and experience working with Python-based kernel DSLs (Triton).

Ways to stand out

Experience tuning BLAS or deep learning library kernels.
Background in numerics and linear algebra.
Experience with machine learning compilers such as TVM or MLIR.
Contributions to open-source projects (especially AI/ML, compilers, or HPC).
Familiarity with recent AI research in algorithms and numerics.

Compensation & Benefits

Base salary range: 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5. Exact base salary will be determined based on location, experience, and pay of employees in similar positions.
Eligible for equity and benefits.

Additional details

Application window: Applications for this job will be accepted at least until August 15, 2025.
NVIDIA is an equal opportunity employer and is committed to fostering a diverse work environment.