Senior Software Engineer, AI and DL Kernel Libraries
at Nvidia
š Santa Clara, United States
USD 184,000-287,500 per year
Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 ā basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 ā daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 ā you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 ā exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Python @ 7
Machine Learning @ 4
TensorFlow @ 7
LLM @ 4
PyTorch @ 7
CUDA @ 7
GPU @ 4
Deep Learning @ 4
AI @ 4
vLLM @ 4
SGLang @ 4
- 1-2 ā basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 ā daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 ā you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 ā exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
We are looking for outstanding AI systems engineers to develop groundbreaking technologies in the inference systems software stack. The team builds AI systems software to accelerate inference by developing libraries, code generators, and GPU kernel technologies for NVIDIA's hardware architecture. Work includes designing efficient attention kernel implementations, LLM inference runtime components, kernel code generators, and other technologies to accelerate large language models and high-impact AI workloads.
Responsibilities
- Innovate and develop new AI systems technologies for efficient inference
- Design, implement, and optimize kernels for high-impact AI workloads
- Design and implement extensible abstractions for LLM serving engines
- Build efficient just-in-time domain-specific compilers and runtimes
- Collaborate closely with engineers across deep learning frameworks, libraries, kernels, and GPU architecture teams
- Contribute to open source projects such as FlashInfer, vLLM, and SGLang
Requirements
- Master's degree in Computer Science, Electrical Engineering, or related field (or equivalent experience); PhD preferred
- 6+ years (academic/industry) experience with ML/DL systems development preferred
- Strong experience developing or using deep learning frameworks (examples: PyTorch, JAX, TensorFlow, ONNX)
- Experience with inference engines and runtimes (examples: vLLM, SGLang, MLC) is desirable
- Strong Python and C/C++ programming skills
- Strong experience in GPU kernel development and performance optimizations, especially using CUDA C/C++, cuTile, Triton, or similar technologies
Ways to Stand Out
- Background in domain-specific compiler and library solutions for LLM inference and training (e.g., FlashInfer, Flash Attention)
- Expertise in inference engines like vLLM and SGLang
- Expertise in machine learning compilers (e.g., Apache TVM, MLIR)
- Open source project ownership or contributions
Compensation & Benefits
- Base salary range: 184,000 USD - 287,500 USD (determined based on location, experience, and pay of employees in similar positions)
- Eligible for equity and benefits (link to NVIDIA benefits)
Other Information
- Applications accepted at least until March 15, 2026
- This posting is for an existing vacancy
- NVIDIA uses AI tools in its recruiting processes
- NVIDIA is an equal opportunity employer and values diversity