Used Tools & Technologies
Not specified
Required Skills & Competences ?
Software Development @ 7 Python @ 4 Algorithms @ 4 Machine Learning @ 4 Parallel Programming @ 4 Debugging @ 7 CUDA @ 6 GPU @ 4Details
We are looking for a Senior Performance Compiler Engineer to join the team working on the open-source Triton compiler project. This role focuses on using compilers and low-level GPU programming to improve AI performance on NVIDIA GPUs, accelerating training and inference for large language models, agents, and other high-impact AI applications. The position is Full time and based in Santa Clara, CA (NVIDIA).
Responsibilities
- Investigate the latest and future NVIDIA GPU hardware architecture and programming models.
- Work on the frontier of AI by understanding advanced algorithms (e.g., attention sinks, Mixture of Experts) and numerics (e.g., block-scaled floating point) to identify new optimization opportunities.
- Design and implement compiler technology using MLIR to optimize high-level kernel descriptions (written in Triton's Python DSL) and generate efficient low-level GPU code.
- When necessary, use inline PTX to hand-tune critical code paths and extract peak hardware performance.
- Engage in an iterative optimization process (kernel-first or compiler-first) to reach peak performance.
- Collaborate with NVIDIA teams, including hardware architects and the CUDA compiler team, to influence future products and ensure high efficiency.
Requirements
- Bachelor, Master or Ph.D. degree (or equivalent experience) in Computer Science, Computer Engineering, Applied Math, or a related field.
- 6+ years of relevant industry experience in software development.
- Demonstrated strong C++ programming and software design skills, with emphasis on performance analysis and debugging.
- Experience in parallel programming, including CUDA/OpenCL GPU programming or other parallel models such as OpenMP.
- Solid understanding of computer architecture and hands-on experience with assembly-level programming.
- Familiarity with MLIR and compiler technology and experience working with Python-based kernel DSLs (Triton).
Ways to stand out
- Experience tuning BLAS or deep learning library kernels.
- Background in numerics and linear algebra.
- Experience with machine learning compilers such as TVM or MLIR.
- Contributions to open-source projects (especially AI/ML, compilers, or HPC).
- Familiarity with recent AI research in algorithms and numerics.
Compensation & Benefits
- Base salary range: 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5. Exact base salary will be determined based on location, experience, and pay of employees in similar positions.
- Eligible for equity and benefits.
Additional details
- Application window: Applications for this job will be accepted at least until August 15, 2025.
- NVIDIA is an equal opportunity employer and is committed to fostering a diverse work environment.