Senior Performance Compiler Engineer - Triton

at Nvidia
USD 184,000-356,500 per year
SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Software Development @ 7 Python @ 4 Algorithms @ 4 Machine Learning @ 4 Parallel Programming @ 4 Debugging @ 7 CUDA @ 6 GPU @ 4

Details

We are looking for a Senior Performance Compiler Engineer to join the team working on the open-source Triton compiler project. This role focuses on using compilers and low-level GPU programming to improve AI performance on NVIDIA GPUs, accelerating training and inference for large language models, agents, and other high-impact AI applications. The position is Full time and based in Santa Clara, CA (NVIDIA).

Responsibilities

  • Investigate the latest and future NVIDIA GPU hardware architecture and programming models.
  • Work on the frontier of AI by understanding advanced algorithms (e.g., attention sinks, Mixture of Experts) and numerics (e.g., block-scaled floating point) to identify new optimization opportunities.
  • Design and implement compiler technology using MLIR to optimize high-level kernel descriptions (written in Triton's Python DSL) and generate efficient low-level GPU code.
  • When necessary, use inline PTX to hand-tune critical code paths and extract peak hardware performance.
  • Engage in an iterative optimization process (kernel-first or compiler-first) to reach peak performance.
  • Collaborate with NVIDIA teams, including hardware architects and the CUDA compiler team, to influence future products and ensure high efficiency.

Requirements

  • Bachelor, Master or Ph.D. degree (or equivalent experience) in Computer Science, Computer Engineering, Applied Math, or a related field.
  • 6+ years of relevant industry experience in software development.
  • Demonstrated strong C++ programming and software design skills, with emphasis on performance analysis and debugging.
  • Experience in parallel programming, including CUDA/OpenCL GPU programming or other parallel models such as OpenMP.
  • Solid understanding of computer architecture and hands-on experience with assembly-level programming.
  • Familiarity with MLIR and compiler technology and experience working with Python-based kernel DSLs (Triton).

Ways to stand out

  • Experience tuning BLAS or deep learning library kernels.
  • Background in numerics and linear algebra.
  • Experience with machine learning compilers such as TVM or MLIR.
  • Contributions to open-source projects (especially AI/ML, compilers, or HPC).
  • Familiarity with recent AI research in algorithms and numerics.

Compensation & Benefits

  • Base salary range: 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5. Exact base salary will be determined based on location, experience, and pay of employees in similar positions.
  • Eligible for equity and benefits.

Additional details

  • Application window: Applications for this job will be accepted at least until August 15, 2025.
  • NVIDIA is an equal opportunity employer and is committed to fostering a diverse work environment.