Senior Software Engineer, CUDA Core Libraries

at Nvidia
USD 184,000-356,500 per year
SENIOR
✅ On-site

Used Tools & Technologies

Not specified

Required Skills & Competences

Python @ 4 Algorithms @ 4 Communication @ 4 API @ 4 PyTorch @ 4 CUDA @ 4 GPU @ 4 Deep Learning @ 4 AI @ 4 Profiling @ 4 HPC @ 4

Details

NVIDIA’s accelerated computing platform is the foundation of modern HPC and AI. At the core of this platform are the CUDA Core Libraries: C++ and Python libraries that enable developers to write fast, reliable, and scalable GPU-accelerated software. This role is a full-time Software Engineer position working on the CUDA Core Libraries (projects such as CCCL — Thrust, CUB, libcudacxx — cuda-python, and numba-cuda) to build foundational libraries, algorithms, and language/runtime infrastructure for GPU computing across deep learning, scientific computing, and data analytics.

Responsibilities

  • Develop and implement CUDA Core Libraries in C++ and/or Python, including parallel algorithms and idiomatic language bindings for core CUDA functionality.
  • Compose, optimize, and evolve GPU algorithms and APIs, from high-level interfaces down to low-level performance tuning involving memory, parallelism, and synchronization.
  • Own features end-to-end: development, implementation, testing, benchmarking, documentation, and long-term maintenance.
  • Improve developer experience across the stack: CI, tests, benchmarks, packaging, examples, and docs.
  • Collaborate with senior CUDA engineers in design reviews, code reviews, and open-source-style workflows.
  • Engage with real users through issues, performance investigations, and API feedback.

Requirements

  • BS, MS, or PhD in Computer Science, Computer Engineering, or a related field, or equivalent experience.
  • Minimum of 8+ years of related development experience.
  • Strong programming skills in C++, Python, or both, with proven interest in systems-level software (performance, memory, concurrency, API design).
  • Solid understanding of modern C++ (templates, generics, standard library) and/or Python library development and packaging.
  • Practical experience with parallel or heterogeneous programming (CUDA, OpenMP, GPU-accelerated Python, or similar).
  • Experience contributing to production software or open-source libraries, including testing, profiling, and code review.
  • Ability to work independently, scope problems, and drive projects to completion.
  • Clear written communication for technical design and documentation.
  • Comfort navigating large, multi-language codebases (C++, Python, CMake, Pixi, CI systems).

Ways to stand out

  • Strong understanding of CPU/GPU architecture and how hardware details affect performance.
  • Hands-on experience with CUDA C++, CUDA Python, PyTorch, JAX, Numba, CuPy, or similar GPU-accelerated stacks.
  • Familiarity with Thrust, CUB, libcudacxx, or other modern C++/GPU libraries.
  • Experience with compiler infrastructure or tooling (LLVM, Clang tooling, MLIR).
  • Demonstrated interest in developer tools, library design, and making other developers faster.

Compensation & Benefits

  • Base salary ranges (location and level dependent):
    • Level 4: 184,000 USD – 287,500 USD
    • Level 5: 224,000 USD – 356,500 USD
  • Eligible for equity and benefits.

Other information

  • Applications for this job will be accepted at least until March 19, 2026.
  • This posting is for an existing vacancy.
  • NVIDIA uses AI tools in its recruiting processes.
  • NVIDIA is an equal opportunity employer committed to fostering a diverse work environment.