Senior Deep Learning Software Engineer, Inference

at Nvidia

📍 Santa Clara, United States

USD 148,000-287,500 per year

SENIOR

✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Software Development @ 6 Python @ 1 Performance Optimization @ 4 Debugging @ 4 LLM @ 4 PyTorch @ 4 Agile @ 1 CUDA @ 4 GPU @ 4

Details

NVIDIA seeks a Senior Software Engineer specializing in Deep Learning Inference to design, build, and optimize GPU-accelerated software that powers modern AI applications. You will work on high-performance deep learning frameworks (including SGLang and vLLM) to improve model serving and inference across NVIDIA accelerators—from datacenter GPUs to edge SoCs—using open-source tools and libraries.

Responsibilities

Performance optimization, analysis, and tuning of deep learning models (LLM, multimodal, and generative AI).
Scale performance of DL models across different NVIDIA architectures and accelerator types.
Contribute features and code to NVIDIA inference libraries and frameworks (vLLM, SGLang, FlashInfer, and related LLM software solutions).
Work cross-functionally with teams across frameworks, NVIDIA libraries, and inference optimization groups.
Implement and optimize model serving pipelines using tools and plugins (CUTLASS, OAI Triton, NCCL, CUDA kernels, etc.).

Requirements

Master's or PhD (or equivalent experience) in Computer Science, Computer Engineering, EECS, AI, or related field.
5+ years of relevant software development experience.
Excellent C/C++ programming and software design skills.
Experience with GPU programming (CUDA) and CUDA kernels; familiarity with CUTLASS, Triton, and NCCL is a plus.
Prior experience deploying or optimizing DL model inference in production is a plus.
Background in performance modeling, profiling, debugging, and code optimization or architectural knowledge of CPUs and GPUs is a plus.
Python experience is a plus; Agile software development experience is helpful.

Preferred / Ways to Stand Out

Contributions to deep learning software projects (PyTorch, vLLM, SGLang).
Experience with multi-GPU communications and related libraries (NCCL, NVSHMEM).
Demonstrated work on LLMs, generative AI, and large-scale model serving.

Benefits & Compensation

Base salary is determined by location, experience, and comparable roles. Provided base salary ranges:
- Level 3: 148000 USD - 235750 USD
- Level 4: 184000 USD - 287500 USD
Eligible for equity and NVIDIA benefits (see NVIDIA benefits page).
Applications accepted at least until July 29, 2025.

Technologies & Tools Mentioned

SGLang, vLLM, FlashInfer, CUTLASS, OAI Triton, NCCL, NVSHMEM, CUDA, CUDA kernels, C/C++, Python, PyTorch, LLMs, generative AI, multi-GPU communications, profiling and performance optimization.