Senior Deep Learning Software Engineer, FlashInfer

at Nvidia
USD 184,000-287,500 per year
SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Python @ 7 Machine Learning @ 4 TensorFlow @ 7 LLM @ 4 PyTorch @ 7 CUDA @ 7 GPU @ 4

Details

We are looking for a Senior Deep Learning Software Engineer to join the FlashInfer team at NVIDIA. You will help develop AI systems software to accelerate inference, including libraries, code generators, and GPU kernel technologies for NVIDIA hardware. Work includes designing and building efficient attention kernel implementations, LLM inference runtime components, and kernel code generators to accelerate large language models, agents, and other AI workloads.

Responsibilities

  • Innovate and develop new AI systems technologies for efficient inference
  • Design, implement, and optimize kernels for high-impact AI workloads
  • Design and implement extensible abstractions for LLM serving engines
  • Build efficient just-in-time domain-specific compilers and runtimes
  • Collaborate closely with engineers across deep learning frameworks, libraries, kernels, and GPU architecture teams
  • Contribute to open source communities and projects such as FlashInfer, vLLM, and SGLang

Requirements

  • Masters degree in Computer Science, Electrical Engineering, or related field (or equivalent experience); PhD preferred
  • 6+ years (academic/industry) experience with ML/DL systems development preferred
  • Strong experience developing or using deep learning frameworks (examples mentioned: PyTorch, JAX, TensorFlow, ONNX)
  • Experience with inference engines and runtimes such as vLLM, SGLang, and MLC is desirable
  • Strong programming skills in Python and C/C++

Ways to stand out

  • Background in domain-specific compiler and library solutions for LLM inference and training (e.g., FlashInfer, Flash Attention)
  • Expertise with inference engines like vLLM and SGLang
  • Experience with machine learning compilers (e.g., Apache TVM, MLIR)
  • Strong experience in GPU kernel development and performance optimizations (especially using CUDA C/C++, cuTile, Triton, or similar)
  • Open source project ownership or contributions

Compensation & Benefits

  • Base salary range: 184,000 USD - 287,500 USD (determined based on location, experience, and internal pay bands)
  • Eligible for equity and benefits

Additional details

  • Applications for this job will be accepted at least until August 5, 2025
  • NVIDIA is an equal opportunity employer and committed to fostering a diverse work environment