Senior Deep Learning Compiler Engineer - XLA

at Nvidia

📍 Santa Clara, United States

USD 148,000-287,500 per year

SENIOR

✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Algorithms @ 4 TensorFlow @ 4 Mentoring @ 1 Debugging @ 4 PyTorch @ 4 CUDA @ 4 GPU @ 4

Details

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. As an NVIDIAN, you’ll be immersed in a diverse, supportive environment where everyone is inspired to do their best work. Come join the team and see how you can make a lasting impact on the world.

Responsibilities

Develop compiler optimization algorithms for deep learning workloads.
Optimize inference and training performance for the JAX framework and the OpenXLA compiler on NVIDIA GPUs at scale.
Collaborate with deep learning framework teams and hardware architecture teams to accelerate next-generation deep learning software.
Craft and implement compiler optimization techniques for deep learning network graphs.
Design novel graph partitioning and tensor sharding techniques for distributed training and inference.
Perform performance tuning and analysis.
Code generation for NVIDIA GPU backends using open-source compilers such as MLIR, LLVM, and OpenAI Triton.
Design user-facing features in JAX and related libraries as well as other software engineering work.
Work closely with GPU hardware engineering teams to design AI compiler software features for next-generation GPUs.

Requirements

Bachelor’s, Master’s, or Ph.D. in Computer Science, Computer Engineering, related field or equivalent experience.
4+ years of relevant work or research experience in performance analysis and compiler optimizations.
Ability to work independently, define project goals and scope, lead development efforts adopting clean software engineering and testing practices.
Excellent C/C++ programming and software design skills, including debugging, performance analysis, and test design.
Strong foundation in architecture of CPUs, GPUs or other high-performance hardware accelerators.
Knowledge of high-performance computing and distributed programming.
CUDA or OpenCL programming experience is desired but not required.
Experience with XLA, TVM, MLIR, LLVM, OpenAI Triton, deep learning models and algorithms, and deep learning framework design is a huge plus.
Strong interpersonal skills, ability to work dynamically in a product-oriented team.
History of mentoring junior engineers and interns is a bonus.

Ways to Stand Out

Experience working with deep learning frameworks like JAX, PyTorch or TensorFlow.
Extensive experience with CUDA or GPUs in general.
Experience with open-source compilers such as XLA, LLVM, MLIR, or TVM.

Benefits

Highly competitive salaries, comprehensive benefits package, equity eligibility, and the opportunity to work in cutting-edge fields including Virtual Reality, Artificial Intelligence, Deep Learning, and Autonomous Vehicles.