Deep Learning Software Engineer, FlashInfer - New College Grad 2025

at Nvidia
USD 104,000-172,500 per year
JUNIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Python @ 6 Machine Learning @ 3 TensorFlow @ 6 LLM @ 3 PyTorch @ 6 CUDA @ 6 GPU @ 3

Details

NVIDIA is seeking Deep Learning Software Engineers to develop AI inference systems software that accelerates inference for large language models and other AI workloads. The team builds libraries, code generators, GPU kernel technologies, and inference runtimes (for example FlashInfer, vLLM, SGLang) to optimize LLM serving and high-impact AI workloads. This role involves designing abstractions, implementing efficient kernels, building JIT domain-specific compilers and runtimes, and collaborating with framework, libraries, and GPU architecture teams.

Responsibilities

  • Innovate and develop new AI systems technologies for efficient inference
  • Design, implement, and optimize kernels for high-impact AI workloads
  • Design and implement extensible abstractions for LLM serving engines
  • Build efficient just-in-time domain-specific compilers and runtimes
  • Collaborate closely with other engineers across deep learning frameworks, libraries, kernels, and GPU architecture teams
  • Contribute to open source communities and projects such as FlashInfer, vLLM, and SGLang

Requirements

  • Bachelor's degree in Computer Science, Electrical Engineering, or a related field (or equivalent experience); PhD preferred
  • Strong experience in developing or using deep learning frameworks (examples: PyTorch, JAX, TensorFlow, ONNX)
  • Ideally experience with inference engines and runtimes such as vLLM, SGLang, and MLC
  • Strong Python and C/C++ programming skills

Ways to stand out

  • Background in domain-specific compiler and library solutions for LLM inference and training (e.g., FlashInfer, Flash Attention)
  • Expertise in inference engines like vLLM and SGLang
  • Expertise in machine learning compilers (e.g., Apache TVM, MLIR)
  • Strong experience in GPU kernel development and performance optimizations (especially using CUDA C/C++, cuTile, Triton, or similar)
  • Open-source project ownership or contributions

Compensation & Benefits

  • Base salary range: 104,000 USD - 172,500 USD (final base salary determined based on location, experience, and internal pay equity)
  • Eligible for equity and benefits (see NVIDIA benefits)

Other details

  • Applications for this job will be accepted at least until August 22, 2025.
  • NVIDIA is an equal opportunity employer and values diversity in its workforce.