Senior DL Compiler Engineer - CUDA Tile

at Nvidia

📍 Santa Clara, United States

USD 152,000-241,500 per year

SENIOR

✅ On-site

Used Tools & Technologies

LLM GenAI

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Algorithms @ 4 Hiring @ 4 Performance Optimization @ 4 Debugging @ 4 API @ 4 CUDA @ 4 GPU @ 4 Deep Learning @ 4 Generative AI @ 4 AI @ 4 OpenCL @ 4 Performance Analysis @ 4 LLVM @ 4

Details

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. Today NVIDIA is tapping into the unlimited potential of AI to define the next era of computing. NVIDIA GPUs are at the center of the deep learning revolution and continue to enable breakthroughs in generative AI, large language models, recommendation systems, speech recognition, image classification and other areas.

Role overview

We are hiring software engineers for the CUDA Tile team. In this role you will work on CUDA Tile, a new tile-based programming model for NVIDIA GPUs (shipped with CUDA 13.1). You will design and implement compiler transformations, develop MLIR-based dialects and lowering passes, and optimize the performance of tile-based kernels so they execute efficiently across multiple generations of NVIDIA GPU architectures. The scope includes defining public APIs, implementing compiler and optimization techniques, performance optimization, and other general software engineering work.

Responsibilities

Design and implement compiler transformations and optimization passes for tile-based kernels.
Develop MLIR-based dialects and lowering passes.
Optimize performance of kernels across multiple NVIDIA GPU architectures.
Define and implement public APIs related to the tile programming model.
Perform performance analysis, debugging, and test design for compiler-generated code.
Work as part of a product-oriented team and drive development efforts independently.

Requirements

Bachelors, Masters or Ph.D. in Computer Science, Computer Engineering or a related field (or equivalent experience).
3+ years of relevant work or research experience in compiler optimization, performance analysis and IR design.
Excellent C/C++ programming and software design skills, including debugging, performance analysis, and test design.
Ability to work independently, define project goals and scope, and lead your own development effort.
Strong interpersonal skills and ability to collaborate in a dynamic product-oriented team.

Ways to stand out

Knowledge of CPU and/or GPU architecture.
CUDA or OpenCL programming experience.
Experience with MLIR, LLVM, XLA, TVM.
Experience with deep learning models and algorithms.

Benefits and compensation

Base salary range: 152,000 USD - 241,500 USD (determined based on location, experience, and pay of employees in similar positions).
Eligibility for equity and benefits (see company benefits).

Additional information

Applications for this job will be accepted at least until June 20, 2026.
This posting is for an existing vacancy. NVIDIA uses AI tools in its recruiting processes and is an equal opportunity employer.