Senior Deep Learning Software Engineer, Inference and Model Optimization

at Nvidia
USD 148,000-287,500 per year
SENIOR
βœ… On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Python @ 3 Algorithms @ 7 Machine Learning @ 4 Communication @ 4 Performance Optimization @ 4 Debugging @ 4 LLM @ 4 PyTorch @ 3 CUDA @ 4 GPU @ 4

Details

NVIDIA's Algorithmic Model Optimization Team focuses on optimizing generative AI models (large language models and diffusion models) for maximal inference efficiency. The team conducts applied research and builds a software platform (TRT Model Optimizer) used internally and externally to enable best-in-class AI model deployment and inference.

Responsibilities

  • Train, develop, and deploy state-of-the-art generative AI models such as LLMs and diffusion models using NVIDIA's AI software stack.
  • Leverage and build upon the Torch 2.0 ecosystem (TorchDynamo, torch.export, torch.compile, etc.) to analyze and extract standardized model graph representations from arbitrary Torch models for automated deployment.
  • Develop high-performance optimization techniques for inference including automated model sharding (tensor parallelism, sequence parallelism), efficient attention kernels with KV-caching, and related techniques.
  • Collaborate across teams to integrate performant kernel implementations (CUDA, TRT-LLM, Triton) into the automated deployment solution.
  • Analyze and profile GPU kernel-level performance to identify hardware and software optimization opportunities.
  • Architect and design a modular, scalable software platform with broad model support and optimization capabilities to improve user experience and adoption.

Requirements

  • Master's, PhD, or equivalent experience in Computer Science, AI, Applied Math, or a related field.
  • 3+ years of relevant work or research experience in deep learning.
  • Excellent software design skills, including debugging, performance analysis, and test design.
  • Strong proficiency in Python and PyTorch and familiarity with related ML tools (e.g., HuggingFace).
  • Strong algorithms and programming fundamentals.
  • Good written and verbal communication skills and ability to work independently and collaboratively in a fast-paced environment.

Ways to stand out

  • Contributions to PyTorch, JAX, or other machine learning frameworks.
  • Knowledge of GPU architecture and compilation stacks and ability to understand and debug end-to-end performance.
  • Familiarity with NVIDIA deep learning SDKs such as TensorRT.
  • Prior experience writing high-performance GPU kernels for ML workloads in CUDA, CUTLASS, or Triton.

Compensation and other details

  • Base salary ranges provided by location and level:
    • Level 3: 148,000 USD - 235,750 USD
    • Level 4: 184,000 USD - 287,500 USD
  • Eligible for equity and benefits.
  • Applications accepted at least until October 20, 2025.

Equal opportunity

NVIDIA is an equal opportunity employer committed to fostering a diverse work environment and does not discriminate on the basis of protected characteristics.