Senior Performance Software Engineer, Deep Learning Libraries

at Nvidia
USD 184,000-356,500 per year
SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

CI/CD @ 4 TensorFlow @ 3 Parallel Programming @ 4 Debugging @ 7 CUDA @ 4 GPU @ 4

Details

We are looking for a Senior Performance Software Engineer for Deep Learning Libraries to develop optimized code that accelerates linear algebra and deep learning operations on NVIDIA GPUs.

Responsibilities

  • Write highly tuned compute kernels, mostly in C++ CUDA, for core deep learning operations such as matrix multiplies, convolutions, and normalizations
  • Follow general software engineering best practices including support for regression testing and CI/CD flows
  • Collaborate with teams across NVIDIA including:
    • CUDA compiler team for generating optimal assembly code
    • Deep learning training and inference performance teams for optimization needs
    • Hardware and architecture teams on programming models for new deep learning hardware features

Requirements

  • Masters or PhD degree or equivalent experience in Computer Science, Computer Engineering, Applied Math, or related field
  • 6+ years of relevant industry experience
  • Strong C++ programming and software design skills, including debugging, performance analysis, and test design
  • Experience with performance-oriented parallel programming (e.g., OpenMP, pthreads), even if not on GPUs
  • Solid understanding of computer architecture and some experience with assembly programming

Preferred Qualifications

  • Experience tuning BLAS or deep learning library kernel code
  • CUDA or OpenCL GPU programming
  • Knowledge of numerical methods and linear algebra
  • Familiarity with LLVM, TVM tensor expressions, or TensorFlow MLIR

Benefits

  • Competitive base salary range: 184,000 USD - 356,500 USD
  • Eligibility for equity and benefits
  • Work at NVIDIA, a leading technology company fostering diversity and innovation

Join the team that builds software underpinning AI breakthroughs in image classification, speech recognition, and natural language processing. Work on cutting-edge GPU efficiency and deep learning software stacks.