Senior GenAI Algorithms Engineer — Post-Training Optimizations

at Nvidia
USD 184,000-287,500 per year
SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Python @ 6 Algorithms @ 4 Hiring @ 4 Communication @ 7 Mathematics @ 4 Debugging @ 7 API @ 4 LLM @ 4 PyTorch @ 6 CUDA @ 4 GPU @ 4

Details

NVIDIA's Algorithmic Model Optimization Team focuses on optimizing generative AI models (LLMs, diffusion models, VLMs, and multi-modality models) for maximal inference efficiency using techniques such as quantization, speculative decoding, sparsity, knowledge distillation, pruning, neural architecture search, and streamlined deployment strategies with open-source inference frameworks. The role is to design, implement, and productionize model optimization algorithms for inference and deployment on NVIDIA hardware platforms with emphasis on ease of use, compute and memory efficiency, and software–hardware co-design.

Responsibilities

  • Design and build modular, scalable model optimization software platforms that support diverse AI models and optimization techniques and deliver strong user experiences.
  • Explore, develop, and integrate deep learning optimization algorithms (quantization, speculative decoding, sparsity, etc.) into NVIDIA's AI software stack (TensorRT Model Optimizer, NeMo/Megatron, TensorRT-LLM).
  • Construct and curate large problem-specific datasets for post-training, fine-tuning, and reinforcement learning workflows.
  • Deploy optimized models into leading open-source inference frameworks and contribute APIs, model-level optimizations, and features tuned to NVIDIA hardware capabilities.
  • Partner with NVIDIA teams to deliver model optimization solutions for customer use cases, ensuring balanced accuracy–performance trade-offs and end-to-end workflows.
  • Drive continuous innovation in deep learning inference performance to strengthen NVIDIA platform integration and expand market adoption.

Requirements

  • Master’s, PhD, or equivalent experience in Computer Science, Artificial Intelligence, Applied Mathematics, or a related field.
  • 5+ years of relevant work or research experience in deep learning.
  • Strong software design skills including debugging, performance analysis, and test development.
  • Proficiency in Python, PyTorch, and modern ML frameworks/tools.
  • Proven foundation in algorithms and programming fundamentals.
  • Strong written and verbal communication skills and ability to work independently and collaboratively in a fast-paced environment.

Ways to stand out

  • Contributions to PyTorch, Megatron-LM, NeMo, TensorRT-LLM, vLLM, SGLang, or other ML training/inference frameworks.
  • Hands-on training, fine-tuning, or reinforcement learning experience on LLM or VLM models with large-scale GPU clusters.
  • Proficiency in GPU architectures and compilation stacks and ability to analyze and debug end-to-end performance.
  • Familiarity with NVIDIA deep learning SDKs (NeMo, TensorRT, TensorRT-LLM).
  • Experience with custom kernel development on GPU (CUDA, Triton) and software–hardware co-design.

Compensation & Benefits

  • Base salary range: 184,000 USD - 287,500 USD. Base salary will be determined based on location, experience, and pay of employees in similar positions.
  • Eligible for equity and NVIDIA benefits (link to benefits in original posting).

Additional Information

  • Applications accepted at least until September 20, 2025.
  • NVIDIA is an equal opportunity employer and values diversity in hiring and promotion practices.