Senior Deep Learning Compiler Engineer - XLA

at Nvidia

📍 Santa Clara, United States

USD 152,000-241,500 per year

SENIOR

✅ On-site

Used Tools & Technologies

Not specified

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Algorithms @ 4 TensorFlow @ 4 Mentoring @ 1 Debugging @ 4 PyTorch @ 4 CUDA @ 4 GPU @ 4 Deep Learning @ 7 AI @ 7 OpenCL @ 4 Performance Analysis @ 4

Details

NVIDIA's invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI — the next era of computing — with the GPU acting as the brain of computers, robots, and self-driving cars that can perceive and understand the world. NVIDIA is increasingly known as “the AI computing company.”

We are looking for versatile software engineers for our XLA team to build high-performance, production-grade software that is at the core of next-generation AI systems.

Responsibilities

Develop compiler optimization algorithms for deep learning workloads.
Optimize inference and training performance for the JAX framework and the OpenXLA compiler on NVIDIA GPUs at scale.
Craft and implement compiler optimization techniques for deep learning network graphs.
Design novel graph partitioning and tensor sharding techniques for distributed training and inference.
Perform performance tuning and analysis.
Implement code generation for NVIDIA GPU backends using open-source compilers such as MLIR, LLVM, and OpenAI Triton.
Design user-facing features in JAX and related libraries and perform general software engineering work.
Collaborate closely with GPU hardware engineering teams and deep learning framework partners to design AI compiler software features for next-generation GPUs.

Requirements

Bachelors, Masters or Ph.D. in Computer Science, Computer Engineering, related field (or equivalent experience).
4+ years of relevant work or research experience in performance analysis and compiler optimizations.
Ability to work independently, define project goals and scope, and lead development efforts following clean software engineering and testing practices.
Excellent C/C++ programming and software design skills, including debugging, performance analysis, and test design.
Strong foundation in the architecture of CPUs, GPUs, or other high-performance hardware accelerators; knowledge of high-performance computing and distributed programming.
CUDA or OpenCL programming experience is desired but not required.
Experience with technologies that are a huge plus: XLA, TVM, MLIR, LLVM, OpenAI Triton, deep learning models and algorithms, and deep learning framework design.
Strong interpersonal skills and ability to work in a dynamic product-oriented team. Mentoring experience is a bonus.

Ways to Stand Out

Experience working with deep learning frameworks such as JAX, PyTorch, or TensorFlow.
Extensive experience with CUDA or GPUs in general.
Experience with open-source compilers such as XLA, LLVM, MLIR, or TVM.

Compensation & Additional Information

Base salary range: 152,000 USD - 241,500 USD (base salary will be determined based on location, experience, and pay of employees in similar positions).
Eligible for equity and benefits.
Applications accepted at least until March 1, 2026. This posting is for an existing vacancy.
NVIDIA uses AI tools in its recruiting processes and is an equal opportunity employer committed to fostering a diverse work environment.

#deeplearning