Used Tools & Technologies
Not specified
Required Skills & Competences ?
Software Development @ 6 Python @ 1 Performance Optimization @ 4 Debugging @ 4 LLM @ 4 PyTorch @ 4 Agile @ 1 CUDA @ 4 GPU @ 4Details
NVIDIA seeks a Senior Software Engineer specializing in Deep Learning Inference to design, build, and optimize GPU-accelerated software that powers modern AI applications. You will work on high-performance deep learning frameworks (including SGLang and vLLM) to improve model serving and inference across NVIDIA accelerators—from datacenter GPUs to edge SoCs—using open-source tools and libraries.
Responsibilities
- Performance optimization, analysis, and tuning of deep learning models (LLM, multimodal, and generative AI).
- Scale performance of DL models across different NVIDIA architectures and accelerator types.
- Contribute features and code to NVIDIA inference libraries and frameworks (vLLM, SGLang, FlashInfer, and related LLM software solutions).
- Work cross-functionally with teams across frameworks, NVIDIA libraries, and inference optimization groups.
- Implement and optimize model serving pipelines using tools and plugins (CUTLASS, OAI Triton, NCCL, CUDA kernels, etc.).
Requirements
- Master's or PhD (or equivalent experience) in Computer Science, Computer Engineering, EECS, AI, or related field.
- 5+ years of relevant software development experience.
- Excellent C/C++ programming and software design skills.
- Experience with GPU programming (CUDA) and CUDA kernels; familiarity with CUTLASS, Triton, and NCCL is a plus.
- Prior experience deploying or optimizing DL model inference in production is a plus.
- Background in performance modeling, profiling, debugging, and code optimization or architectural knowledge of CPUs and GPUs is a plus.
- Python experience is a plus; Agile software development experience is helpful.
Preferred / Ways to Stand Out
- Contributions to deep learning software projects (PyTorch, vLLM, SGLang).
- Experience with multi-GPU communications and related libraries (NCCL, NVSHMEM).
- Demonstrated work on LLMs, generative AI, and large-scale model serving.
Benefits & Compensation
- Base salary is determined by location, experience, and comparable roles. Provided base salary ranges:
- Level 3: 148000 USD - 235750 USD
- Level 4: 184000 USD - 287500 USD
- Eligible for equity and NVIDIA benefits (see NVIDIA benefits page).
- Applications accepted at least until July 29, 2025.
Technologies & Tools Mentioned
SGLang, vLLM, FlashInfer, CUTLASS, OAI Triton, NCCL, NVSHMEM, CUDA, CUDA kernels, C/C++, Python, PyTorch, LLMs, generative AI, multi-GPU communications, profiling and performance optimization.