Used Tools & Technologies
Not specified
Required Skills & Competences ?
Software Development @ 7 Python @ 4 Algorithms @ 4 Machine Learning @ 4 Parallel Programming @ 4 Debugging @ 7 CUDA @ 6 GPU @ 4Details
We are seeking a Senior Performance Compiler Engineer to work on the open-source Triton compiler project to improve AI performance on NVIDIA GPUs. The role focuses on using compiler technology, kernel optimization, and hardware-aware tuning to accelerate both training and inference for large language models, agents, and other high-impact AI applications.
Responsibilities
- Investigate the latest and future NVIDIA GPU hardware architecture and programming models.
- Work on the frontier of AI by understanding advanced algorithms (e.g., attention sinks, MoEs) and numerics (e.g., block-scaled floating point) to identify new optimization opportunities.
- Design and implement compiler technology using MLIR to optimize high-level kernel descriptions written in Triton's Python DSL and generate efficient low-level GPU code.
- When necessary, use inline PTX to hand-tune critical code paths and extract peak performance from the hardware.
- Engage in an iterative optimization process that may start from the kernel or from the compiler to find the most efficient path to peak performance.
- Collaborate with teams across NVIDIA, including hardware architects and the CUDA compiler team, to influence future products and ensure maximum efficiency.
Requirements
- Bachelor, Master or Ph.D. degree (or equivalent experience) in Computer Science, Computer Engineering, Applied Math, or a related field.
- 6+ years of relevant industry experience in software development.
- Strong C++ programming and software design skills, with an emphasis on performance analysis and debugging.
- Experience in parallel programming, including CUDA/OpenCL GPU programming or other parallel models such as OpenMP.
- Solid understanding of computer architecture and hands-on experience with assembly-level programming.
Ways to stand out from the crowd
- Experience tuning BLAS or deep learning library kernels.
- Background in numerics and linear algebra.
- Experience with machine learning compilers such as TVM or MLIR.
- Contributions to open-source projects, especially in AI/ML, compilers, or high-performance computing.
- Familiarity with the latest research in AI algorithms and numerics and a strong track record of open-source contributions in relevant domains.
Compensation & Benefits
- Base salary range (by level):
- Level 4: 184,000 USD - 287,500 USD
- Level 5: 224,000 USD - 356,500 USD
- You will also be eligible for equity and benefits (see: https://www.nvidia.com/en-us/benefits/).
Additional information
- Full-time role.
- Applications accepted at least until August 15, 2025.
- NVIDIA is an equal opportunity employer and is committed to fostering a diverse work environment.