Senior Performance Software Engineer, Deep Learning Libraries

at Nvidia
USD 184,000-356,500 per year
SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

CI/CD @ 4 Algorithms @ 4 TensorFlow @ 4 Parallel Programming @ 4 Debugging @ 7 CUDA @ 4 GPU @ 4

Details

We are looking for a Senior Performance Software Engineer for deep learning libraries who enjoys tuning parallel algorithms and analyzing their performance. You will develop optimized code to accelerate linear algebra and deep learning operations on NVIDIA GPUs. The team delivers high-performance code to NVIDIA's cuDNN, cuBLAS, and TensorRT libraries and works on code close to the GPU hardware. The team focuses on peak GPU efficiency for current and future-generation GPUs and contributes to open-source projects such as CUTLASS showcasing performant matrix multiply on NVIDIA Tensor Cores.

Responsibilities

  • Write highly tuned compute kernels, mostly in C++ and CUDA, to perform core deep learning operations (e.g., matrix multiplies, convolutions, normalizations).
  • Follow software engineering best practices including regression testing and CI/CD flows.
  • Collaborate with other teams including:
    • CUDA compiler team on generating optimal assembly code.
    • Deep learning training and inference performance teams to prioritize layers for optimization.
    • Hardware and architecture teams on programming models for new deep learning hardware features.

Requirements

  • Master's or PhD degree or equivalent experience in Computer Science, Computer Engineering, Applied Math, or a related field.
  • 6+ years of relevant industry experience.
  • Strong C++ programming and software design skills, including debugging, performance analysis, and test design.
  • Experience with performance-oriented parallel programming (GPU or CPU), e.g., OpenMP or pthreads.
  • Solid understanding of computer architecture and some experience with assembly programming.

Ways to stand out

  • Experience tuning BLAS or deep learning library kernel code.
  • CUDA or OpenCL GPU programming experience.
  • Strong background in numerical methods and linear algebra.
  • Experience with LLVM, TVM tensor expressions, or TensorFlow MLIR.

Benefits

  • Eligibility for equity and NVIDIA benefits.

Compensation and location

  • Base salary ranges (dependent on location, experience, and level):
    • Level 4: 184,000 USD - 287,500 USD
    • Level 5: 224,000 USD - 356,500 USD
  • Location: Santa Clara, CA, United States.

Additional information

  • Applications accepted at least until July 29, 2025.
  • NVIDIA is an equal opportunity employer committed to fostering a diverse work environment.

Technologies and topics mentioned

C++, CUDA, GPU programming, cuDNN, cuBLAS, TensorRT, CUTLASS, Tensor Cores, matrix multiply, convolutions, normalizations, BLAS, OpenCL, OpenMP, pthreads, assembly, LLVM, TVM, TensorFlow MLIR, numerical methods, linear algebra, performance analysis, profiling, regression testing, CI/CD, software design, debugging.