Senior Performance Compiler Engineer - Triton

at Nvidia
USD 184,000-287,500 per year
SENIOR
✅ On-site

Used Tools & Technologies

LLM HPC

Required Skills & Competences

Software Development @ 7 Python @ 4 Algorithms @ 4 Machine Learning @ 4 Parallel Programming @ 4 Debugging @ 7 CUDA @ 6 GPU @ 4 Deep Learning @ 4 AI @ 4 OpenCL @ 4 Performance Analysis @ 7

Details

NVIDIA's invention of the GPU in 1999 redefined modern computer graphics and parallel computing and more recently fueled modern AI. We are looking for a Senior Performance Compiler Engineer to join the team working on the open-source Triton compiler project. This role focuses on using compilers and low-level optimization to improve AI performance on NVIDIA GPUs, accelerating training and inference for large language models, agents, and other AI applications.

Responsibilities

  • Investigate the latest and future NVIDIA GPU hardware architecture and programming models.
  • Work on the frontier of AI by understanding advanced algorithms (like attention sinks and MoEs) and numerics (like block-scaled floating point) to identify new optimization opportunities.
  • Design and implement compiler technology using MLIR to optimize high-level kernel descriptions written in Triton's Python DSL, focusing on generating efficient low-level GPU code.
  • When necessary, use inline PTX to hand-tune critical code paths and extract peak hardware performance.
  • Engage in an iterative optimization process (starting from kernels or compiler) to find the most efficient path to peak performance.
  • Collaborate with teams across NVIDIA, including hardware architects and the CUDA compiler team, to influence future products and ensure maximum efficiency.

Requirements

  • Bachelor, Master or Ph.D. degree (or equivalent experience) in Computer Science, Computer Engineering, Applied Math, or a related field.
  • 8+ years of relevant industry experience in software development.
  • Strong C++ programming and software design skills, with emphasis on performance analysis and debugging.
  • Experience in parallel programming, including CUDA/OpenCL GPU programming or other parallel models such as OpenMP.
  • Solid understanding of computer architecture and hands-on experience with assembly-level programming.

Ways to stand out from the crowd

  • Experience tuning BLAS or deep learning library kernels.
  • Background in numerics and linear algebra.
  • Experience with machine learning compilers like TVM or MLIR.
  • Contributions to open-source projects, especially in AI/ML, compilers, or high-performance computing.
  • Familiarity with the latest research in AI algorithms and numerics and a track record of open-source contributions in relevant domains.

Compensation and benefits

  • Base salary range: 184,000 USD - 287,500 USD.
  • You will also be eligible for equity and benefits (see NVIDIA benefits page).

Other details

  • Applications for this job will be accepted at least until May 12, 2026.
  • This posting is for an existing vacancy.
  • NVIDIA uses AI tools in its recruiting processes.
  • NVIDIA is an equal opportunity employer and is committed to a diverse work environment.