Used Tools & Technologies
HPCRequired Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Python @ 4
Algorithms @ 4
Debugging @ 7
CUDA @ 4
GPU @ 4
Deep Learning @ 4
AI @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
NVIDIA's high-performance computing platforms power AI across many applications. CUTLASS is an open-source ecosystem (C++ and Python abstractions) for high-performance linear algebra and Tensor Core primitives used to implement custom matrix multiply (GEMM) and related deep learning computations on NVIDIA GPUs.
Responsibilities
- Develop core components of the CUTLASS platform including Tensor Core MMAs, copies, synchronization barriers, schedulers, and other GPU hardware features in CUDA C++ and CUTLASS Python DSL.
- Contribute to the MLIR-based backend compiler stack for the CUTLASS Python DSL by designing dialects and compiler passes.
- Author example kernels using CUTLASS abstractions to showcase novel GPU hardware features for high performance.
- Collaborate with GPU architecture, CUDA, and NVVM/PTX compiler teams to provide feedback on programming models and assess performance of future GPU hardware features.
Requirements
- Masters or PhD in Computer Science, Computer Engineering, or related field (or equivalent experience).
- 3+ years of relevant industry experience.
- Strong proficiency in C++ programming and software design, including debugging, performance evaluation, and testing.
- Experience with high-performance code generation and knowledge of compiler transformations and optimizations.
- Deep understanding of computer architecture and parallel computing programming models.
Ways to Stand Out
- Experience writing high-performance kernels at low levels of abstraction (NVVM/PTX for GPUs or similar parallel processing architectures).
- Hands-on compiler design experience, particularly in MLIR.
- Understanding of deep learning models, algorithms, and frameworks.
Compensation & Benefits
- Base salary ranges (determined by location and experience):
- Level 3: 152,000 USD - 241,500 USD
- Level 4: 184,000 USD - 287,500 USD
- Eligible for equity and NVIDIA benefits.
Additional Information
- Applications accepted at least until June 5, 2026.
- This posting is for an existing vacancy. NVIDIA uses AI tools in its recruiting processes and is an equal opportunity employer.