Senior GenAI Algorithms Engineer — Model Optimizations for Inference

at Nvidia

📍 Santa Clara, United States

USD 148,000-287,500 per year

SENIOR

✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Python @ 6 Algorithms @ 4 Hiring @ 4 Communication @ 7 Mathematics @ 4 Debugging @ 7 API @ 4 LLM @ 4 PyTorch @ 6 CUDA @ 4 GPU @ 4

Details

NVIDIA's Algorithmic Model Optimization Team focuses on optimizing generative AI models (LLMs, VLMs, multimodal and diffusion models) for maximal inference efficiency using techniques such as quantization, speculative decoding, sparsity, distillation, pruning, neural architecture search, and streamlined deployment strategies with open-source inference frameworks. This role is for a Senior Deep Learning Algorithms Engineer to design, implement, and productionize model optimization algorithms for inference and deployment on NVIDIA hardware, emphasizing software–hardware co-design for compute and memory efficiency and excellent accuracy–performance tradeoffs.

Responsibilities

Design and build modular, scalable model optimization software platforms that provide excellent user experiences and support diverse AI models and optimization techniques.
Explore, develop, and integrate deep learning optimization algorithms (e.g., quantization, speculative decoding, sparsity) into NVIDIA's AI software stack (TensorRT Model Optimizer, NeMo/Megatron, TensorRT-LLM).
Deploy optimized models into leading open-source inference frameworks and contribute specialized APIs, model-level optimizations, and features tailored to NVIDIA hardware.
Partner with NVIDIA teams to deliver model optimization solutions for customer use cases, ensuring optimal end-to-end workflows and balanced accuracy–performance trade-offs.
Conduct GPU kernel-level profiling to find and exploit hardware and software optimization opportunities (efficient attention kernels, KV cache optimization, parallelism strategies).
Drive continuous innovation in deep learning inference performance to strengthen NVIDIA platform integration and expand market adoption.

Requirements

Master's, PhD, or equivalent experience in Computer Science, Artificial Intelligence, Applied Mathematics, or a related field.
5+ years of relevant work or research experience in deep learning.
Strong software design skills, including debugging, performance analysis, and test development.
Proficiency in Python, PyTorch, and modern ML frameworks/tools.
Proven foundation in algorithms and programming fundamentals.
Strong written and verbal communication skills; ability to work independently and collaboratively in a fast-paced environment.

Ways to stand out

Contributions to PyTorch, JAX, vLLM, SGLang, or other ML training/inference frameworks.
Hands-on experience training or fine-tuning generative AI models on large-scale GPU clusters.
Proficiency with GPU architectures and compilation stacks, and strong end-to-end performance analysis/debugging skills.
Familiarity with NVIDIA's deep learning SDKs (e.g., TensorRT).
Experience developing high-performance GPU kernels for ML workloads using CUDA, CUTLASS, or Triton.

Compensation & Benefits

Base salary ranges provided: 148,000 USD - 235,750 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4.
Eligible for equity and a comprehensive benefits package. See NVIDIA benefits for details.

Additional information

Work spans multiple layers of the AI software stack and may include GPU-level optimization and custom kernel development.
Expected to work on integration with both NVIDIA's ecosystem (TensorRT Model Optimizer, NeMo/Megatron, TensorRT-LLM) and open-source frameworks (PyTorch, Hugging Face, vLLM, SGLang).
Applications accepted at least until September 26, 2025.
NVIDIA is an equal opportunity employer and values diversity in hiring and promotion.