Senior Performance Software Engineer, Deep Learning Libraries

at Nvidia

📍 Santa Clara, United States

USD 184,000-356,500 per year

SENIOR

✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

CI/CD @ 4 TensorFlow @ 3 Parallel Programming @ 4 Debugging @ 7 CUDA @ 4 GPU @ 4

Details

We are looking for a Senior Performance Software Engineer for Deep Learning Libraries to develop optimized code that accelerates linear algebra and deep learning operations on NVIDIA GPUs.

Responsibilities

Write highly tuned compute kernels, mostly in C++ CUDA, for core deep learning operations such as matrix multiplies, convolutions, and normalizations
Follow general software engineering best practices including support for regression testing and CI/CD flows
Collaborate with teams across NVIDIA including:
- CUDA compiler team for generating optimal assembly code
- Deep learning training and inference performance teams for optimization needs
- Hardware and architecture teams on programming models for new deep learning hardware features

Requirements

Masters or PhD degree or equivalent experience in Computer Science, Computer Engineering, Applied Math, or related field
6+ years of relevant industry experience
Strong C++ programming and software design skills, including debugging, performance analysis, and test design
Experience with performance-oriented parallel programming (e.g., OpenMP, pthreads), even if not on GPUs
Solid understanding of computer architecture and some experience with assembly programming

Preferred Qualifications

Experience tuning BLAS or deep learning library kernel code
CUDA or OpenCL GPU programming
Knowledge of numerical methods and linear algebra
Familiarity with LLVM, TVM tensor expressions, or TensorFlow MLIR

Benefits

Competitive base salary range: 184,000 USD - 356,500 USD
Eligibility for equity and benefits
Work at NVIDIA, a leading technology company fostering diversity and innovation

Join the team that builds software underpinning AI breakthroughs in image classification, speech recognition, and natural language processing. Work on cutting-edge GPU efficiency and deep learning software stacks.