Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Python @ 4
Algorithms @ 4
Hiring @ 4
Mentoring @ 4
Debugging @ 4
API @ 4
PyTorch @ 4
CUDA @ 6
GPU @ 4
Deep Learning @ 7
AI @ 7
Robotics @ 4
OpenCL @ 6
Performance Analysis @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
NVIDIA's invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI — the next era of computing — with the GPU acting as the brain of computers, robots, and self-driving cars that can perceive and understand the world. Today, we are increasingly known as “the AI computing company”.
We are hiring software engineers for the Deep Learning & AI Compiler (DLC) team. Our DLC is the backbone of NVIDIA’s inference engine across data centers, personal devices, automotive, and robotics. The compiler must deliver leading inference performance, fast build time, reduced memory footprints, and ease of use in both Ahead-of-Time and Just-in-Time modes.
Responsibilities
- Analyze deep learning networks and develop compiler optimization algorithms.
- Collaborate with deep learning software framework teams and GPU architecture teams to accelerate next-generation deep learning software.
- Define public APIs, implement performance optimizations and analysis, and craft compiler techniques for AI workloads and future NVIDIA GPUs.
Requirements
- Bachelor’s, Master’s, or Ph.D. in Computer Science, Computer Engineering, a related field, or equivalent experience.
- 3+ years of relevant work or research experience in performance analysis and compiler optimizations.
- Experience with compiler technologies (examples given: MLIR, LLVM, XLA, Triton).
- Excellent C/C++ and Python programming and software design skills, including debugging, performance analysis, and test design.
- Ability to work independently, define project goals and scope, and lead your own development efforts.
- Strong interpersonal skills and ability to work in a dynamic product-oriented team.
Ways to stand out from the crowd
- Proficiency in CPU and/or GPU architecture; CUDA or OpenCL programming experience.
- Understanding of deep learning models, algorithms, and frameworks such as PyTorch and JAX.
- GPU kernel authoring and performance analysis using tools such as Nsight Compute.
- Experience mentoring early-career engineers and interns.
- Track record on new hardware bring-up.
Compensation and Benefits
- Base salary range: 152,000 USD - 241,500 USD (final base salary determined by location, experience, and pay of employees in similar positions).
- Eligible for equity and benefits.
Applications for this job will be accepted at least until February 27, 2026. NVIDIA uses AI tools in its recruiting processes and is an equal opportunity employer committed to fostering a diverse work environment.