Senior Deep Learning Compiler Engineer - PyTorch

at Nvidia

📍 Berlin, Germany

PLN 292,500-507,000 per year

SENIOR

✅ On-site

Used Tools & Technologies

GPU

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Python @ 7 GitHub @ 4 Distributed Systems @ 3 Communication @ 4 Parallel Programming @ 3 PyTorch @ 4 CUDA @ 4 Deep Learning @ 4 AI @ 4 Performance Analysis @ 4

Details

Join us at the forefront of AI compiler technology and help shape the future of accelerated computing. NVIDIA is seeking passionate engineers to build the next generation of tools used by AI developers and researchers worldwide. Our team is developing Thunder, an ambitious, source-to-source compiler built to unlock outstanding performance for PyTorch models on NVIDIA GPUs (https://github.com/Lightning-AI/lightning-thunder). This is a unique opportunity to contribute to a project that enhances the PyTorch ecosystem, working with modern compiler stacks like PyTorch 2.0's TorchDynamo and TorchInductor to create powerful, open-source solutions that benefit the entire community.

Responsibilities

Lead the design, implementation, optimization, and maintenance of core compiler technologies that accelerate large deep learning workloads.
Collaborate with engineers who built PyTorch for NVIDIA hardware and work closely with compiler, library, and systems teams (including nvFuser, TVM, XLA, and CUDA) to translate research into practical, high-impact solutions.
Perform deep performance analysis on workloads running at scale (thousands of GPUs) to find optimization opportunities that will shape Thunder's design.
Contribute to a vibrant open-source ecosystem and help pioneer new framework features.

Requirements

Bachelor's, Master's, or Ph.D. in Computer Science or a related technical field (or equivalent experience).
8+ years of relevant work experience.
Strong command of Python and experience building complex, well-tested software systems.
Hands-on experience with deep learning frameworks such as PyTorch or JAX; understanding of model construction and performance bottlenecks.
Solid foundation in compiler concepts, including abstract syntax trees (ASTs), intermediate representations (e.g., SSA form), program analysis, and code generation.
Excellent communication and collaboration skills for working effectively in a distributed, open-source environment.

Preferred / Ways to stand out

Previous contributions to deep learning compiler projects (e.g., TVM, MLIR, IREE) or deep learning frameworks.
Deep expertise in PyTorch internals, particularly the compiler stack (TorchDynamo, TorchInductor).
Experience with JAX-like functional transformations in a compiler context.
Familiarity with parallel programming, distributed systems, and writing high-performance CUDA code.
Track record of impactful participation in open-source communities through code contributions, design discussions, or mentorship.

About NVIDIA

NVIDIA is at the forefront of breakthroughs in Artificial Intelligence, High-Performance Computing, and Visualization. Teams are composed of driven, innovative professionals dedicated to pushing the boundaries of technology. NVIDIA offers competitive salaries, an extensive benefits package, and a work environment that promotes diversity, inclusion, and flexibility. As an equal opportunity employer, NVIDIA is committed to fostering a supportive and empowering workplace for all.

Compensation

For Poland: The base salary range is 292,500 PLN - 507,000 PLN.