Senior Deep Learning Software Engineer, Inference

at Nvidia

📍 Santa Clara, United States

USD 148,000-287,500 per year

SENIOR

✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Software Development @ 6 Python @ 1 Algorithms @ 4 Performance Optimization @ 4 Debugging @ 4 LLM @ 4 PyTorch @ 4 Agile @ 1 CUDA @ 4 GPU @ 4

Details

NVIDIA seeks a Senior Software Engineer specializing in Deep Learning Inference for our growing team. As a key contributor, you will help design, build, and optimize the GPU-accelerated software that powers today’s most sophisticated AI applications. Our team is responsible for developing and maintaining high-performance deep learning frameworks, including SGLang and vLLM, which are at the forefront of efficient large-scale model serving and inference. You will play a central role in improving these platforms, facilitating smooth deployment and serving of groundbreaking language models.

You will work closely with the deep learning community to implement the latest algorithms for public release in frameworks like SGLang and vLLM, as well as other DL frameworks. Your work will focus on identifying and driving performance improvements for state-of-the-art LLM and Generative AI models across NVIDIA accelerators, from datacenter GPUs to edge SoCs. You will bring to bear open-source tools and plugins — including CUTLASS, OAI Triton, NCCL, and CUDA kernels — to implement and optimize model serving pipelines.

Responsibilities

Performance optimization, analysis, and tuning of deep learning models in domains such as LLM, multimodal, and generative AI.
Scale performance of deep learning models across different architectures and types of NVIDIA accelerators (datacenter GPUs to edge SoCs).
Contribute features and code to NVIDIA’s inference libraries, including vLLM and SGLang, FlashInfer, and other LLM software solutions.
Collaborate across teams working on frameworks, NVIDIA libraries, and inference optimization solutions.
Implement and optimize model serving pipelines using open-source tools and plugins.

Requirements

Masters, PhD, or equivalent experience in a relevant field (Computer Engineering, Computer Science, EECS, AI).
5+ years of relevant software development experience.
Excellent C/C++ programming and software design skills. Agile software development experience is helpful; Python experience is a plus.
Prior experience with training, deploying, or optimizing inference of deep learning models in production is a plus.
Background in performance modeling, profiling, debugging, code optimization, or architectural knowledge of CPU and GPU is a plus.

Ways to stand out:

Contributions to deep learning software projects such as PyTorch, vLLM, and SGLang.
Experience with multi-GPU communications (NCCL, NVSHMEM).
Experience building and shipping products to enterprise customers.
GPU programming experience (CUDA, OAI Triton, or CUTLASS).

Benefits

Base salary range (location- and level-dependent):
- Level 3: 148,000 USD - 235,750 USD
- Level 4: 184,000 USD - 287,500 USD
Eligibility for equity and additional benefits.

Applications for this job will be accepted at least until October 19, 2025.

NVIDIA is committed to fostering a diverse work environment and is an equal opportunity employer. We do not discriminate on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status, or any other characteristic protected by law.