Senior Performance Compiler Engineer - Triton

at Nvidia
USD 184,000-356,500 per year
SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Software Development @ 7 Python @ 4 Algorithms @ 4 Machine Learning @ 4 Parallel Programming @ 4 Debugging @ 7 CUDA @ 6 GPU @ 4

Details

We are seeking a Senior Performance Compiler Engineer to work on the open-source Triton compiler project to improve AI performance on NVIDIA GPUs. The role focuses on using compiler technology, kernel optimization, and hardware-aware tuning to accelerate both training and inference for large language models, agents, and other high-impact AI applications.

Responsibilities

  • Investigate the latest and future NVIDIA GPU hardware architecture and programming models.
  • Work on the frontier of AI by understanding advanced algorithms (e.g., attention sinks, MoEs) and numerics (e.g., block-scaled floating point) to identify new optimization opportunities.
  • Design and implement compiler technology using MLIR to optimize high-level kernel descriptions written in Triton's Python DSL and generate efficient low-level GPU code.
  • When necessary, use inline PTX to hand-tune critical code paths and extract peak performance from the hardware.
  • Engage in an iterative optimization process that may start from the kernel or from the compiler to find the most efficient path to peak performance.
  • Collaborate with teams across NVIDIA, including hardware architects and the CUDA compiler team, to influence future products and ensure maximum efficiency.

Requirements

  • Bachelor, Master or Ph.D. degree (or equivalent experience) in Computer Science, Computer Engineering, Applied Math, or a related field.
  • 6+ years of relevant industry experience in software development.
  • Strong C++ programming and software design skills, with an emphasis on performance analysis and debugging.
  • Experience in parallel programming, including CUDA/OpenCL GPU programming or other parallel models such as OpenMP.
  • Solid understanding of computer architecture and hands-on experience with assembly-level programming.

Ways to stand out from the crowd

  • Experience tuning BLAS or deep learning library kernels.
  • Background in numerics and linear algebra.
  • Experience with machine learning compilers such as TVM or MLIR.
  • Contributions to open-source projects, especially in AI/ML, compilers, or high-performance computing.
  • Familiarity with the latest research in AI algorithms and numerics and a strong track record of open-source contributions in relevant domains.

Compensation & Benefits

  • Base salary range (by level):
    • Level 4: 184,000 USD - 287,500 USD
    • Level 5: 224,000 USD - 356,500 USD
  • You will also be eligible for equity and benefits (see: https://www.nvidia.com/en-us/benefits/).

Additional information

  • Full-time role.
  • Applications accepted at least until August 15, 2025.
  • NVIDIA is an equal opportunity employer and is committed to fostering a diverse work environment.