Senior Deep Learning Software Engineer, Inference

at Nvidia
USD 148,000-287,500 per year
SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Software Development @ 6 Python @ 1 Performance Optimization @ 4 Debugging @ 4 LLM @ 4 PyTorch @ 4 Agile @ 1 CUDA @ 4 GPU @ 4

Details

NVIDIA seeks a Senior Software Engineer specializing in Deep Learning Inference to design, build, and optimize GPU-accelerated software that powers modern AI applications. You will work on high-performance deep learning frameworks (including SGLang and vLLM) to improve model serving and inference across NVIDIA accelerators—from datacenter GPUs to edge SoCs—using open-source tools and libraries.

Responsibilities

  • Performance optimization, analysis, and tuning of deep learning models (LLM, multimodal, and generative AI).
  • Scale performance of DL models across different NVIDIA architectures and accelerator types.
  • Contribute features and code to NVIDIA inference libraries and frameworks (vLLM, SGLang, FlashInfer, and related LLM software solutions).
  • Work cross-functionally with teams across frameworks, NVIDIA libraries, and inference optimization groups.
  • Implement and optimize model serving pipelines using tools and plugins (CUTLASS, OAI Triton, NCCL, CUDA kernels, etc.).

Requirements

  • Master's or PhD (or equivalent experience) in Computer Science, Computer Engineering, EECS, AI, or related field.
  • 5+ years of relevant software development experience.
  • Excellent C/C++ programming and software design skills.
  • Experience with GPU programming (CUDA) and CUDA kernels; familiarity with CUTLASS, Triton, and NCCL is a plus.
  • Prior experience deploying or optimizing DL model inference in production is a plus.
  • Background in performance modeling, profiling, debugging, and code optimization or architectural knowledge of CPUs and GPUs is a plus.
  • Python experience is a plus; Agile software development experience is helpful.

Preferred / Ways to Stand Out

  • Contributions to deep learning software projects (PyTorch, vLLM, SGLang).
  • Experience with multi-GPU communications and related libraries (NCCL, NVSHMEM).
  • Demonstrated work on LLMs, generative AI, and large-scale model serving.

Benefits & Compensation

  • Base salary is determined by location, experience, and comparable roles. Provided base salary ranges:
    • Level 3: 148000 USD - 235750 USD
    • Level 4: 184000 USD - 287500 USD
  • Eligible for equity and NVIDIA benefits (see NVIDIA benefits page).
  • Applications accepted at least until July 29, 2025.

Technologies & Tools Mentioned

SGLang, vLLM, FlashInfer, CUTLASS, OAI Triton, NCCL, NVSHMEM, CUDA, CUDA kernels, C/C++, Python, PyTorch, LLMs, generative AI, multi-GPU communications, profiling and performance optimization.