Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Python @ 4
Algorithms @ 4
Communication @ 4
API @ 4
PyTorch @ 4
CUDA @ 4
GPU @ 4
Deep Learning @ 4
AI @ 4
Profiling @ 4
HPC @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
NVIDIA’s accelerated computing platform is the foundation of modern HPC and AI. At the core of this platform are the CUDA Core Libraries: C++ and Python libraries that enable developers to write fast, reliable, and scalable GPU-accelerated software. This role is a full-time Software Engineer position working on the CUDA Core Libraries (projects such as CCCL — Thrust, CUB, libcudacxx — cuda-python, and numba-cuda) to build foundational libraries, algorithms, and language/runtime infrastructure for GPU computing across deep learning, scientific computing, and data analytics.
Responsibilities
- Develop and implement CUDA Core Libraries in C++ and/or Python, including parallel algorithms and idiomatic language bindings for core CUDA functionality.
- Compose, optimize, and evolve GPU algorithms and APIs, from high-level interfaces down to low-level performance tuning involving memory, parallelism, and synchronization.
- Own features end-to-end: development, implementation, testing, benchmarking, documentation, and long-term maintenance.
- Improve developer experience across the stack: CI, tests, benchmarks, packaging, examples, and docs.
- Collaborate with senior CUDA engineers in design reviews, code reviews, and open-source-style workflows.
- Engage with real users through issues, performance investigations, and API feedback.
Requirements
- BS, MS, or PhD in Computer Science, Computer Engineering, or a related field, or equivalent experience.
- Minimum of 8+ years of related development experience.
- Strong programming skills in C++, Python, or both, with proven interest in systems-level software (performance, memory, concurrency, API design).
- Solid understanding of modern C++ (templates, generics, standard library) and/or Python library development and packaging.
- Practical experience with parallel or heterogeneous programming (CUDA, OpenMP, GPU-accelerated Python, or similar).
- Experience contributing to production software or open-source libraries, including testing, profiling, and code review.
- Ability to work independently, scope problems, and drive projects to completion.
- Clear written communication for technical design and documentation.
- Comfort navigating large, multi-language codebases (C++, Python, CMake, Pixi, CI systems).
Ways to stand out
- Strong understanding of CPU/GPU architecture and how hardware details affect performance.
- Hands-on experience with CUDA C++, CUDA Python, PyTorch, JAX, Numba, CuPy, or similar GPU-accelerated stacks.
- Familiarity with Thrust, CUB, libcudacxx, or other modern C++/GPU libraries.
- Experience with compiler infrastructure or tooling (LLVM, Clang tooling, MLIR).
- Demonstrated interest in developer tools, library design, and making other developers faster.
Compensation & Benefits
- Base salary ranges (location and level dependent):
- Level 4: 184,000 USD – 287,500 USD
- Level 5: 224,000 USD – 356,500 USD
- Eligible for equity and benefits.
Other information
- Applications for this job will be accepted at least until March 19, 2026.
- This posting is for an existing vacancy.
- NVIDIA uses AI tools in its recruiting processes.
- NVIDIA is an equal opportunity employer committed to fostering a diverse work environment.