Senior Performance Software Engineer, Deep Learning Libraries
at Nvidia
USD 184,000-356,500 per year
SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
CI/CD @ 4 TensorFlow @ 3 Parallel Programming @ 4 Debugging @ 7 CUDA @ 4 GPU @ 4Details
We are looking for a Senior Performance Software Engineer for Deep Learning Libraries to develop optimized code that accelerates linear algebra and deep learning operations on NVIDIA GPUs.
Responsibilities
- Write highly tuned compute kernels, mostly in C++ CUDA, for core deep learning operations such as matrix multiplies, convolutions, and normalizations
- Follow general software engineering best practices including support for regression testing and CI/CD flows
- Collaborate with teams across NVIDIA including:
- CUDA compiler team for generating optimal assembly code
- Deep learning training and inference performance teams for optimization needs
- Hardware and architecture teams on programming models for new deep learning hardware features
Requirements
- Masters or PhD degree or equivalent experience in Computer Science, Computer Engineering, Applied Math, or related field
- 6+ years of relevant industry experience
- Strong C++ programming and software design skills, including debugging, performance analysis, and test design
- Experience with performance-oriented parallel programming (e.g., OpenMP, pthreads), even if not on GPUs
- Solid understanding of computer architecture and some experience with assembly programming
Preferred Qualifications
- Experience tuning BLAS or deep learning library kernel code
- CUDA or OpenCL GPU programming
- Knowledge of numerical methods and linear algebra
- Familiarity with LLVM, TVM tensor expressions, or TensorFlow MLIR
Benefits
- Competitive base salary range: 184,000 USD - 356,500 USD
- Eligibility for equity and benefits
- Work at NVIDIA, a leading technology company fostering diversity and innovation
Join the team that builds software underpinning AI breakthroughs in image classification, speech recognition, and natural language processing. Work on cutting-edge GPU efficiency and deep learning software stacks.