Senior Deep Learning Software Engineer, Inference and Model Optimization

at Nvidia

📍 Santa Clara, United States

USD 184,000-356,500 per year

SENIOR

✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Python @ 7 Algorithms @ 7 Machine Learning @ 4 Leadership @ 4 Communication @ 4 Performance Optimization @ 4 Debugging @ 4 LLM @ 4 PyTorch @ 7 CUDA @ 4 GPU @ 4

Details

NVIDIA's Algorithmic Model Optimization Team focuses on optimizing generative AI models (LLMs and diffusion models) for maximal inference efficiency using techniques such as neural architecture search, pruning, sparsity, quantization, and automated deployment strategies. The team conducts applied research to improve model efficiency and develops the TRT Model Optimizer platform used internally and externally by research and engineering teams.

Responsibilities

Train, develop, and deploy state-of-the-art generative AI models such as large language models (LLMs) and diffusion models using NVIDIA's AI software stack.
Leverage and build upon the Torch 2.0 ecosystem (TorchDynamo, torch.export, torch.compile, etc.) to analyze and extract standardized model graph representations from arbitrary torch models for automated deployment.
Develop high-performance optimization techniques for inference, including automated model sharding (e.g., tensor parallelism, sequence parallelism), efficient attention kernels with kv-caching, and related techniques.
Collaborate with teams across NVIDIA to integrate performant kernel implementations (CUDA, TRT-LLM, Triton) into the automated deployment solution.
Analyze and profile GPU kernel-level performance to identify hardware and software optimization opportunities.
Continuously innovate on inference performance to ensure NVIDIA's inference software solutions (TensorRT, TRT-LLM, TRT Model Optimizer) maintain and increase market leadership.
Architect and design a modular, scalable software platform that provides broad model support, optimization techniques, and an excellent user experience to increase adoption.

Requirements

Master's, PhD, or equivalent experience in Computer Science, AI, Applied Math, or a related field.
5+ years of relevant work or research experience in Deep Learning.
Excellent software design skills, including debugging, performance analysis, and test design.
Strong proficiency in Python and PyTorch; experience with HuggingFace is expected.
Strong algorithms and programming fundamentals.
Good written and verbal communication skills and the ability to work independently and collaboratively in a fast-paced environment.

Ways to stand out

Contributions to machine learning frameworks such as PyTorch or JAX.
Knowledge of GPU architecture and the compilation stack, and the ability to understand and debug end-to-end performance.
Familiarity with NVIDIA's deep learning SDKs such as TensorRT and TRT-LLM.
Prior experience writing high-performance GPU kernels for machine learning workloads using CUDA, CUTLASS, or Triton.

Benefits

Competitive base salary (see ranges below) determined by location, experience, and internal pay parity.
Eligibility for equity and a comprehensive benefits package.
NVIDIA emphasizes diversity and is an equal opportunity employer.

Compensation and Application

Base salary range for Level 4: 184,000 USD – 287,500 USD.
Base salary range for Level 5: 224,000 USD – 356,500 USD.
You will also be eligible for equity and benefits (see NVIDIA benefits page).
Applications for this job will be accepted at least until July 29, 2025.