Senior Deep Learning Software Engineer, FlashInfer

at Nvidia

📍 Santa Clara, United States

USD 184,000-287,500 per year

SENIOR

✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Python @ 7 Machine Learning @ 4 TensorFlow @ 7 LLM @ 4 PyTorch @ 7 CUDA @ 7 GPU @ 4

Details

We are looking for a Senior Deep Learning Software Engineer to join the FlashInfer team at NVIDIA. You will help develop AI systems software to accelerate inference, including libraries, code generators, and GPU kernel technologies for NVIDIA hardware. Work includes designing and building efficient attention kernel implementations, LLM inference runtime components, and kernel code generators to accelerate large language models, agents, and other AI workloads.

Responsibilities

Innovate and develop new AI systems technologies for efficient inference
Design, implement, and optimize kernels for high-impact AI workloads
Design and implement extensible abstractions for LLM serving engines
Build efficient just-in-time domain-specific compilers and runtimes
Collaborate closely with engineers across deep learning frameworks, libraries, kernels, and GPU architecture teams
Contribute to open source communities and projects such as FlashInfer, vLLM, and SGLang

Requirements

Masters degree in Computer Science, Electrical Engineering, or related field (or equivalent experience); PhD preferred
6+ years (academic/industry) experience with ML/DL systems development preferred
Strong experience developing or using deep learning frameworks (examples mentioned: PyTorch, JAX, TensorFlow, ONNX)
Experience with inference engines and runtimes such as vLLM, SGLang, and MLC is desirable
Strong programming skills in Python and C/C++

Ways to stand out

Background in domain-specific compiler and library solutions for LLM inference and training (e.g., FlashInfer, Flash Attention)
Expertise with inference engines like vLLM and SGLang
Experience with machine learning compilers (e.g., Apache TVM, MLIR)
Strong experience in GPU kernel development and performance optimizations (especially using CUDA C/C++, cuTile, Triton, or similar)
Open source project ownership or contributions

Compensation & Benefits

Base salary range: 184,000 USD - 287,500 USD (determined based on location, experience, and internal pay bands)
Eligible for equity and benefits

Additional details

Applications for this job will be accepted at least until August 5, 2025
NVIDIA is an equal opportunity employer and committed to fostering a diverse work environment