Senior AI Frameworks Engineer

at Nvidia

📍 Santa Clara, United States

USD 152,000-287,500 per year

SENIOR

✅ On-site

Used Tools & Technologies

HPC

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Python @ 4 Parallel Programming @ 4 Debugging @ 4 API @ 4 CUDA @ 4 GPU @ 4 Deep Learning @ 4 AI @ 4 LLVM @ 4

Details

NVIDIA's high-performance computing platforms power AI across many applications and industries. Within our software stack, CUTLASS is an open-source ecosystem dedicated to high-performance math primitives. We are building the next frontier: Pythonic CUTLASS (CUTLASS DSL) to bring high-performance abstractions and "speed-of-light" performance directly into the Python environment. Join the CUTLASS team to bridge low-level hardware primitives and high-level developer productivity.

Responsibilities

Design APIs that prioritize user productivity and provide a "native" feel for developers used to modern scientific computing and deep learning frameworks.
Develop robust compilation infrastructure, including AST transformations and JIT-friendly execution, to lower Pythonic descriptions into high-performance GPU machine code.
Optimize developer experience by creating debugging tools, profiler integrations, and validation methodologies for kernel development and usage.
Build production-grade delivery infrastructure for the open-source community, managing package distribution (wheels, conda), user-facing documentation, and testing.
Contribute as a core member of the CUTLASS project to deliver GPU programming and kernel delivery tools.

Requirements

MS or PhD in Computer Science, Electrical Engineering, or related field (or equivalent experience).
At least 3+ years of relevant experience.
Strong proficiency in Python and C++, specifically regarding the design of Python extensions and foreign function interfaces (FFI).
Experience in library or framework development with a focus on intuitive API design for complex technical systems.
Deep understanding of the Python ecosystem's delivery stack, including building, testing, and distributing high-performance compiled extensions (wheels, conda).

Ways to stand out

Active maintainer status or significant contributions to high-performance open-source libraries, AI frameworks, or compiler projects (for example LLVM/MLIR).
Understanding of compiler foundations such as intermediate representations (IR), lowering passes, or AST manipulation.
Experience with GPU architecture and parallel programming models (CUDA).

Compensation and benefits

Base salary ranges (determined by location and experience):
- Level 3: 152,000 USD - 241,500 USD per year
- Level 4: 184,000 USD - 287,500 USD per year
Eligible for equity and benefits (link to NVIDIA benefits provided in original posting).

Additional information

Location specified: Santa Clara, CA, United States.
Employment type: Full time.
Applications accepted at least until May 9, 2026.
NVIDIA uses AI tools in its recruiting processes and is an equal opportunity employer.