Deep Learning Software Engineer, FlashInfer - New College Grad 2025

at Nvidia

📍 Santa Clara, United States

USD 104,000-172,500 per year

JUNIOR

✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Python @ 6 Machine Learning @ 3 TensorFlow @ 6 LLM @ 3 PyTorch @ 6 CUDA @ 6 GPU @ 3

Details

NVIDIA is seeking Deep Learning Software Engineers to develop AI inference systems software that accelerates inference for large language models and other AI workloads. The team builds libraries, code generators, GPU kernel technologies, and inference runtimes (for example FlashInfer, vLLM, SGLang) to optimize LLM serving and high-impact AI workloads. This role involves designing abstractions, implementing efficient kernels, building JIT domain-specific compilers and runtimes, and collaborating with framework, libraries, and GPU architecture teams.

Responsibilities

Innovate and develop new AI systems technologies for efficient inference
Design, implement, and optimize kernels for high-impact AI workloads
Design and implement extensible abstractions for LLM serving engines
Build efficient just-in-time domain-specific compilers and runtimes
Collaborate closely with other engineers across deep learning frameworks, libraries, kernels, and GPU architecture teams
Contribute to open source communities and projects such as FlashInfer, vLLM, and SGLang

Requirements

Bachelor's degree in Computer Science, Electrical Engineering, or a related field (or equivalent experience); PhD preferred
Strong experience in developing or using deep learning frameworks (examples: PyTorch, JAX, TensorFlow, ONNX)
Ideally experience with inference engines and runtimes such as vLLM, SGLang, and MLC
Strong Python and C/C++ programming skills

Ways to stand out

Background in domain-specific compiler and library solutions for LLM inference and training (e.g., FlashInfer, Flash Attention)
Expertise in inference engines like vLLM and SGLang
Expertise in machine learning compilers (e.g., Apache TVM, MLIR)
Strong experience in GPU kernel development and performance optimizations (especially using CUDA C/C++, cuTile, Triton, or similar)
Open-source project ownership or contributions

Compensation & Benefits

Base salary range: 104,000 USD - 172,500 USD (final base salary determined based on location, experience, and internal pay equity)
Eligible for equity and benefits (see NVIDIA benefits)

Other details

Applications for this job will be accepted at least until August 22, 2025.
NVIDIA is an equal opportunity employer and values diversity in its workforce.