Used Tools & Technologies
HPCRequired Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Python @ 4
Parallel Programming @ 4
Debugging @ 4
API @ 4
CUDA @ 4
GPU @ 4
Deep Learning @ 4
AI @ 4
LLVM @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
NVIDIA's high-performance computing platforms power AI across many applications and industries. Within our software stack, CUTLASS is an open-source ecosystem dedicated to high-performance math primitives. We are building the next frontier: Pythonic CUTLASS (CUTLASS DSL) to bring high-performance abstractions and "speed-of-light" performance directly into the Python environment. Join the CUTLASS team to bridge low-level hardware primitives and high-level developer productivity.
Responsibilities
- Design APIs that prioritize user productivity and provide a "native" feel for developers used to modern scientific computing and deep learning frameworks.
- Develop robust compilation infrastructure, including AST transformations and JIT-friendly execution, to lower Pythonic descriptions into high-performance GPU machine code.
- Optimize developer experience by creating debugging tools, profiler integrations, and validation methodologies for kernel development and usage.
- Build production-grade delivery infrastructure for the open-source community, managing package distribution (wheels, conda), user-facing documentation, and testing.
- Contribute as a core member of the CUTLASS project to deliver GPU programming and kernel delivery tools.
Requirements
- MS or PhD in Computer Science, Electrical Engineering, or related field (or equivalent experience).
- At least 3+ years of relevant experience.
- Strong proficiency in Python and C++, specifically regarding the design of Python extensions and foreign function interfaces (FFI).
- Experience in library or framework development with a focus on intuitive API design for complex technical systems.
- Deep understanding of the Python ecosystem's delivery stack, including building, testing, and distributing high-performance compiled extensions (wheels, conda).
Ways to stand out
- Active maintainer status or significant contributions to high-performance open-source libraries, AI frameworks, or compiler projects (for example LLVM/MLIR).
- Understanding of compiler foundations such as intermediate representations (IR), lowering passes, or AST manipulation.
- Experience with GPU architecture and parallel programming models (CUDA).
Compensation and benefits
- Base salary ranges (determined by location and experience):
- Level 3: 152,000 USD - 241,500 USD per year
- Level 4: 184,000 USD - 287,500 USD per year
- Eligible for equity and benefits (link to NVIDIA benefits provided in original posting).
Additional information
- Location specified: Santa Clara, CA, United States.
- Employment type: Full time.
- Applications accepted at least until May 9, 2026.
- NVIDIA uses AI tools in its recruiting processes and is an equal opportunity employer.