Senior Math Libraries Engineer, CPU and GPU Optimization

at Nvidia
USD 224,000-425,500 per year
SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Python @ 3 CI/CD @ 4 Communication @ 7 Parallel Programming @ 4 Product Management @ 4 API @ 4 PyTorch @ 4 CUDA @ 4 GPU @ 4

Details

NVIDIA is looking for an expert software engineer to help deliver CUDA-X libraries across the NVIDIA CPU and GPU ecosystem. You will join a team designing, developing, and optimizing math libraries used across HPC and AI applications, with a focus on performance, portability, and modern APIs.

Responsibilities

  • Design modern, flexible, and easy-to-use APIs and kernels for math libraries and lead design reviews with collaborators.
  • Work closely with internal teams (Engineering, Product Management) and external partners (researchers, customers) to understand use cases and requirements.
  • Deliver timely math library releases to internal and external customers.
  • Continuously survey trends in software systems to become a domain expert and guide the direction of the libraries.

Requirements

  • PhD or MSc degree in Computer Science, Applied Math, or a related science or engineering field preferred (or equivalent experience).
  • 12+ years of experience designing and developing software for high-performance computing and/or AI applications.
  • Advanced C++ skills, including modern design paradigms (e.g., template metaprogramming, RAII).
  • Parallel programming experience with CUDA, OpenCL, or vector programming on CPU (AVX, NEON or similar).
  • Experience with CPU architectures such as ARM, RISC-V and/or x86_64.
  • Strong collaboration, communication, and documentation habits.

Preferred / Ways to Stand Out

  • Strong background in numerical methods (e.g., FFT, numerical linear algebra).
  • Programming skills with Python and familiarity with high-level ecosystems (e.g., NumPy, JAX, MLIR).
  • Modern automation for building and testing (e.g., CMake, CI/CD, sanitizers).
  • Experience with cross-compilation and setting up CPU/GPU/accelerator cross-compilation toolchains.
  • Background with CCCL, OpenMP, OpenACC, multi-threading, MPI, PGAS.
  • Experience with scientific and deep learning libraries and frameworks such as PyTorch, JAX, MKL, MAGMA, PETSc, Kokkos.

Compensation and Benefits

  • Base salary range (determined by location, experience, and internal pay):
    • Level 5: 224,000 USD - 356,500 USD
    • Level 6: 272,000 USD - 425,500 USD
  • You will also be eligible for equity and benefits.

Other Information

  • Applications for this job will be accepted at least until September 29, 2025.
  • NVIDIA is an equal opportunity employer and is committed to fostering a diverse work environment.