Senior GenAI Algorithms Engineer — Model Optimizations for Inference

at Nvidia
USD 148,000-287,500 per year
SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Python @ 6 Algorithms @ 4 Hiring @ 4 Communication @ 7 Mathematics @ 4 Debugging @ 7 API @ 4 LLM @ 4 PyTorch @ 6 CUDA @ 4 GPU @ 4

Details

NVIDIA is at the forefront of the generative AI revolution. The Algorithmic Model Optimization Team focuses on optimizing generative AI models (large language models, visual-language models, multimodal, and diffusion models) for maximal inference efficiency using techniques such as quantization, speculative decoding, sparsity, distillation, pruning, neural architecture search, and streamlined deployment strategies with open-source inference frameworks. This role is responsible for designing, implementing, and productionizing model optimization algorithms for inference and deployment on NVIDIA hardware, with a focus on ease of use, compute and memory efficiency, and accuracy–performance tradeoffs through software–hardware co-design.

Responsibilities

  • Design and build modular, scalable model optimization software platforms that deliver strong user experiences while supporting diverse AI models and optimization techniques.
  • Explore, develop, and integrate deep learning optimization algorithms (e.g., quantization, speculative decoding, sparsity) into NVIDIA's AI software stack, including TensorRT Model Optimizer, NeMo/Megatron, and TensorRT-LLM.
  • Deploy optimized models into leading open-source inference frameworks and contribute specialized APIs, model-level optimizations, and new features targeted to NVIDIA hardware capabilities.
  • Partner with internal NVIDIA teams to deliver model optimization solutions for customer use cases, ensuring optimal end-to-end workflows and balanced accuracy-performance trade-offs.
  • Conduct deep GPU kernel-level profiling to identify hardware and software optimization opportunities (e.g., efficient attention kernels, KV cache optimization, parallelism strategies).
  • Drive continuous innovation in deep learning inference performance to strengthen NVIDIA platform integration and expand market adoption across the AI inference ecosystem.

Requirements

  • Master's, PhD, or equivalent experience in Computer Science, Artificial Intelligence, Applied Mathematics, or a related field.
  • 5+ years of relevant work or research experience in deep learning.
  • Strong software design skills, including debugging, performance analysis, and test development.
  • Proficiency in Python, PyTorch, and modern ML frameworks/tools.
  • Proven foundation in algorithms and programming fundamentals.
  • Strong written and verbal communication skills, with the ability to work independently and collaboratively in a fast-paced environment.

Ways to stand out

  • Contributions to PyTorch, JAX, vLLM, SGLang, or other ML training and inference frameworks.
  • Hands-on experience training or fine-tuning generative AI models on large-scale GPU clusters.
  • Proficiency with GPU architectures and compilation stacks and adept at analyzing and debugging end-to-end performance.
  • Familiarity with NVIDIA's deep learning SDKs (e.g., TensorRT).
  • Experience developing high-performance GPU kernels for ML workloads using CUDA, CUTLASS, or Triton.

Compensation & Benefits

  • Base salary ranges (location and level dependent):
    • Level 3: 148,000 USD - 235,750 USD
    • Level 4: 184,000 USD - 287,500 USD
  • Eligible for equity and benefits (see NVIDIA benefits page).

Additional Information

  • Location: Santa Clara, CA, United States.
  • Time type: Full time.
  • Applications for this job will be accepted at least until September 26, 2025.
  • NVIDIA is an equal opportunity employer and values diversity in hiring and promotion practices.