Senior GenAI Algorithms Engineer — Post-Training Optimizations

at Nvidia

📍 Santa Clara, United States

USD 184,000-287,500 per year

SENIOR

✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Python @ 3 Algorithms @ 4 Communication @ 7 Mathematics @ 4 Debugging @ 7 API @ 4 LLM @ 4 PyTorch @ 3 CUDA @ 4 GPU @ 4

Details

NVIDIA's Algorithmic Model Optimization Team focuses on optimizing generative AI models (LLMs, VLMs, diffusion and multi-modality models) for maximal inference efficiency using techniques such as quantization, speculative decoding, sparsity, knowledge distillation, pruning, neural architecture search, and streamlined deployment with open-source inference frameworks. This Senior Deep Learning Algorithms Engineer role designs, implements, and productionizes model optimization algorithms for inference and deployment on NVIDIA hardware, emphasizing ease of use, compute and memory efficiency, and software–hardware co-design.

Responsibilities

Design and build modular, scalable model optimization software platforms that support diverse AI models and optimization techniques and deliver exceptional user experiences.
Explore, develop, and integrate deep learning optimization algorithms (e.g., quantization, speculative decoding, sparsity) into NVIDIA's AI software stack such as TensorRT Model Optimizer, NeMo/Megatron, and TensorRT-LLM.
Construct and curate large, problem-specific datasets for post-training, fine-tuning, and reinforcement learning workflows.
Deploy optimized models into leading open-source inference frameworks and contribute specialized APIs, model-level optimizations, and new features tailored to NVIDIA hardware capabilities.
Partner with internal NVIDIA teams and customers to deliver end-to-end model optimization solutions with balanced accuracy–performance trade-offs.
Drive continuous innovation in deep learning inference performance and strengthen NVIDIA platform integration.

Requirements

Master's, PhD, or equivalent experience in Computer Science, Artificial Intelligence, Applied Mathematics, or a related field.
5+ years of relevant work or research experience in deep learning.
Strong software design skills, including debugging, performance analysis, and test development.
Proficiency in Python and PyTorch and familiarity with modern ML frameworks/tools.
Proven foundation in algorithms and programming fundamentals.
Strong written and verbal communication skills and ability to work independently and collaboratively in a fast-paced environment.

Technologies and Tools Mentioned

Frameworks / SDKs: TensorRT (TensorRT Model Optimizer, TensorRT-LLM), Megatron-LM, Megatron-Bridge, Nvidia NeMo (NeMo-AutoModel), PyTorch, Hugging Face, vLLM, SGLang
GPU-level: CUDA, Triton, GPU architectures and compilation stacks
Techniques: quantization, speculative decoding, sparsity, knowledge distillation, pruning, neural architecture search
Domains: LLMs, VLMs, multi-modality models, fine-tuning, reinforcement learning

Ways to Stand Out

Contributions to PyTorch, Megatron-LM, NeMo, TensorRT-LLM, vLLM, SGLang, or other training/inference frameworks.
Hands-on training, fine-tuning, or reinforcement learning experience on LLM or VLM models with large-scale GPU clusters.
Strong proficiency in GPU architectures and compilation stacks and end-to-end performance analysis and debugging.
Familiarity with NVIDIA deep learning SDKs (NeMo, TensorRT, TensorRT-LLM).

Compensation & Additional Info

Base salary range: 184,000 USD - 287,500 USD (determined by location, experience, and pay of similar positions).
Eligible for equity and benefits per NVIDIA benefits program.
Applications accepted at least until September 20, 2025.
NVIDIA is an equal opportunity employer committed to diversity and inclusion.