Senior Deep Learning Software Engineer, Inference

at Nvidia
USD 148,000-287,500 per year
SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Software Development @ 6 Python @ 3 Algorithms @ 4 Performance Optimization @ 4 Debugging @ 4 LLM @ 4 PyTorch @ 4 CUDA @ 4 GPU @ 4

Details

NVIDIA seeks a Senior Software Engineer specializing in Deep Learning Inference to design, build, and optimize GPU-accelerated software that powers advanced AI applications. The team develops and maintains high-performance deep learning frameworks and inference software (including SGLang, vLLM, FlashInfer) to enable efficient large-scale model serving across NVIDIA accelerators from data-center GPUs to edge SoCs. You will work with the deep learning community to implement latest algorithms, drive performance improvements for LLMs and Generative AI models, and integrate open-source tools and NVIDIA libraries into production inference pipelines.

Responsibilities

  • Performance optimization, analysis, and tuning of deep learning models across domains such as LLM, multimodal, and generative AI.
  • Scale performance of DL models across different architectures and types of NVIDIA accelerators (datacenter GPUs to edge SoCs).
  • Contribute features and code to NVIDIA inference libraries and frameworks (vLLM, SGLang, FlashInfer and related LLM software solutions).
  • Collaborate across teams working on frameworks, NVIDIA libraries, and inference optimization solutions to enable smooth deployment and serving of language models.
  • Use and integrate open-source tools and plugins (CUTLASS, Triton, NCCL, CUDA kernels, etc.) to implement and optimize model serving pipelines.

Requirements

  • MS or PhD (or equivalent experience) in Computer Engineering, Computer Science, EECS, AI, or related fields.
  • 5+ years of relevant software development experience.
  • Excellent C/C++ programming and software design skills.
  • Familiarity with Python is a plus.
  • Prior experience with training, deploying, or optimizing inference of DL models in production is a plus.
  • Background in performance modeling, profiling, debugging, and code optimization; architectural knowledge of CPU and GPU is a plus.
  • GPU programming experience (CUDA, Triton or CUTLASS) is a plus.

Ways to stand out

  • Contributions to deep learning software projects (PyTorch, vLLM, SGLang).
  • Experience with multi-GPU communications (NCCL, NVSHMEM).

Compensation & Benefits

  • Base salary ranges by level provided: Level 3: 148,000 USD - 235,750 USD; Level 4: 184,000 USD - 287,500 USD.
  • Eligible for equity and a comprehensive benefits package. (Link to NVIDIA benefits provided in original posting.)

Additional information

  • Full-time role. Applications accepted at least until September 9, 2025.
  • NVIDIA is an equal opportunity employer committed to diversity and inclusion.