Senior Performance Software Engineer, Deep Learning Libraries
    at Nvidia
  
  
  
    
      
      
        USD 184,000-356,500 per year
      
    
    
  
  
    
  
  
  SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
CI/CD @ 4 Algorithms @ 4 TensorFlow @ 4 Parallel Programming @ 4 Debugging @ 7 CUDA @ 4 GPU @ 4Details
We are looking for a Senior Performance Software Engineer for Deep Learning Libraries who enjoys tuning parallel algorithms and analyzing performance. You will develop optimized code to accelerate linear algebra and deep learning operations on NVIDIA GPUs, delivering high-performance code to libraries such as cuDNN, cuBLAS, and TensorRT. This position works low in the deep learning software stack, close to GPU hardware, and focuses on achieving peak GPU efficiency on current and future-generation GPUs.
Responsibilities
- Write highly tuned compute kernels, mostly in C++ CUDA, to perform core deep learning operations (e.g. matrix multiplies, convolutions, normalizations).
- Follow software engineering best practices, including support for regression testing and CI/CD flows.
- Collaborate with other teams across NVIDIA, including:
- CUDA compiler team on generating optimal assembly code.
- Deep learning training and inference performance teams to identify layers that require optimization.
- Hardware and architecture teams on the programming model for new deep learning hardware features.
 
Requirements
- Master’s or PhD degree or equivalent experience in Computer Science, Computer Engineering, Applied Math, or a related field.
- 6+ years of relevant industry experience.
- Demonstrated strong C++ programming and software design skills, including debugging, performance analysis, and test design.
- Experience with performance-oriented parallel programming (e.g., OpenMP or pthreads), even if not on GPUs.
- Solid understanding of computer architecture and some experience with assembly programming.
Ways to stand out
- Experience tuning BLAS or deep learning library kernel code.
- CUDA/OpenCL GPU programming experience.
- Knowledge of numerical methods and linear algebra.
- Experience with LLVM, TVM tensor expressions, or TensorFlow MLIR.
Benefits and additional information
- Base salary range:
- Level 4: 184,000 USD - 287,500 USD
- Level 5: 224,000 USD - 356,500 USD
 
- You will also be eligible for equity and benefits.
- Applications will be accepted at least until October 24, 2025.
- NVIDIA is an equal opportunity employer and is committed to fostering a diverse work environment.
Technologies and libraries mentioned
cuDNN, cuBLAS, TensorRT, CUTLASS (open-source), CUDA, C++, OpenMP, pthreads, LLVM, TVM, TensorFlow MLIR, Tensor Cores, linear algebra, numerical methods.