Deep Learning Software Engineer, Inference and Model Optimization - New College Grad 2025

at Nvidia

📍 Santa Clara, United States

USD 120,000-189,800 per year

JUNIOR

✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Python @ 6 Algorithms @ 6 Leadership @ 3 Debugging @ 3 LLM @ 3 PyTorch @ 3 CUDA @ 3 GPU @ 3

Details

NVIDIA's Algorithmic Model Optimization Team focuses on optimizing generative AI models (LLMs and diffusion models) for maximal inference efficiency using techniques such as neural architecture search, pruning, sparsity, quantization, and automated deployment strategies. The team develops applied research and a software platform (TRT Model Optimizer) used internally and externally to build best-in-class AI models.

Responsibilities

Train, develop, and deploy state-of-the-art generative AI models (LLMs and diffusion models) using NVIDIA's AI software stack.
Leverage and extend the Torch 2.0 ecosystem (TorchDynamo, torch.export, torch.compile, etc.) to analyze and extract standardized model graph representations from arbitrary PyTorch models for automated deployment.
Develop high-performance inference optimization techniques, including automated model sharding (tensor parallelism, sequence parallelism), efficient attention kernels with kv-caching, and other optimizations.
Collaborate across NVIDIA to integrate performant kernel implementations (CUDA, TRT-LLM, Triton, TensorRT) into the automated deployment solution.
Analyze and profile GPU kernel-level performance to identify hardware and software optimization opportunities.
Continuously innovate on inference performance to ensure NVIDIA's inference solutions (TRT, TRT-LLM, TRT Model Optimizer) maintain leadership.
Architect and design a modular, scalable software platform that provides broad model support, strong UX, and adoption-driving optimization techniques.

Requirements

Master's, PhD, or equivalent experience in Computer Science, AI, Applied Math, or a related field.
Experience in deep learning and working with generative models (LLMs, diffusion models).
Excellent software design skills, including debugging, performance analysis, and test design.
Strong proficiency in Python and PyTorch; experience with related ML tools such as Hugging Face.
Strong algorithms and programming fundamentals.

Ways to stand out

Contributions to ML frameworks such as PyTorch or JAX.
Knowledge of GPU architecture and compilation stack with the ability to understand and debug end-to-end performance.
Familiarity with NVIDIA deep learning SDKs such as TensorRT and TRT-LLM.
Experience writing high-performance GPU kernels for ML workloads using CUDA, CUTLASS, or Triton.

Compensation & Benefits

Base salary range: 120,000 USD - 189,750 USD (determined by location, experience, and comparable pay).
Eligible for equity and a comprehensive benefits package (see NVIDIA benefits page).

Other

Applications accepted at least until October 18, 2025.
NVIDIA is an equal opportunity employer committed to fostering diversity and inclusion.