Senior Deep Learning Software Engineer, Inference and Model Optimization

at Nvidia

📍 Santa Clara, United States

USD 148,000-287,500 per year

SENIOR

✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Python @ 3 Algorithms @ 7 Machine Learning @ 4 Leadership @ 4 Communication @ 4 Performance Optimization @ 4 Debugging @ 4 LLM @ 4 PyTorch @ 3 CUDA @ 4 GPU @ 4

Details

NVIDIA's Algorithmic Model Optimization Team focuses on optimizing generative AI models (large language models and diffusion models) for maximal inference efficiency using techniques such as neural architecture search, pruning, sparsity, quantization, and automated deployment strategies. The team conducts applied research and develops the TRT Model Optimizer software platform used internally and externally by research and engineering teams.

Responsibilities

Train, develop, and deploy generative AI models (LLMs and diffusion models) using NVIDIA's AI software stack.
Leverage and build upon the Torch 2.0 ecosystem (TorchDynamo, torch.export, torch.compile, etc.) to analyze and extract standardized model graph representations from arbitrary torch models for automated deployment.
Develop high-performance optimization techniques for inference, including automated model sharding (tensor parallelism, sequence parallelism), efficient attention kernels with KV-caching, and related approaches.
Collaborate with cross-functional teams across NVIDIA to integrate performant kernel implementations into the automated deployment solution.
Analyze and profile GPU kernel-level performance to identify hardware and software optimization opportunities.
Continuously innovate on inference performance to ensure NVIDIA's inference software solutions (TensorRT, TRT-LLM, TRT Model Optimizer) maintain and increase market leadership.
Architect and design a modular, scalable software platform that provides broad model support and a strong user experience.

Requirements

Master's, PhD, or equivalent experience in Computer Science, AI, Applied Math, or a related field.
3+ years of relevant work or research experience in deep learning.
Excellent software design skills, including debugging, performance analysis, and test design.
Strong proficiency in Python and PyTorch, and familiarity with related ML tooling (e.g., HuggingFace).
Strong algorithms and programming fundamentals.
Good written and verbal communication skills and the ability to work independently and collaboratively in a fast-paced environment.

Preferred / Ways to Stand Out

Contributions to PyTorch, JAX, or other machine learning frameworks.
Knowledge of GPU architecture and compilation stacks and the ability to understand and debug end-to-end performance.
Familiarity with NVIDIA's deep learning SDKs such as TensorRT.
Prior experience writing high-performance GPU kernels for ML workloads using CUDA, CUTLASS, or Triton.

Compensation & Benefits

Base salary ranges by level (determined by location, experience, and peer pay):
- Level 3: 148,000 USD - 235,750 USD
- Level 4: 184,000 USD - 287,500 USD
Eligible for equity and benefits (link referenced in original posting).

Additional Information

Applications accepted at least until October 20, 2025.
NVIDIA is an equal opportunity employer committed to fostering a diverse work environment.