Principal Deep Learning Software Engineer, LLM Performance

at Nvidia

📍 Santa Clara, United States

USD 272,000-425,500 per year

SENIOR

✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Software Development @ 4 Python @ 4 Algorithms @ 4 TensorFlow @ 4 Performance Optimization @ 4 Debugging @ 4 LLM @ 4 PyTorch @ 4 CUDA @ 4 GPU @ 4

Details

NVIDIA is seeking an experienced Deep Learning Software Engineer focused on analyzing and improving the performance of large language model (LLM) inference. The role is part of the DL Architecture team working on GPU-accelerated deep learning software such as TensorRT and inference benchmarking frameworks. You will implement and optimize LLM inference, serving, and deployment algorithms across NVIDIA accelerators from datacenter GPUs to edge SoCs, collaborating with teams in performance modeling, kernel development, and inference software.

Responsibilities

Performance optimization, analysis, and tuning of LLM, VLM and Generative AI models for deep learning inference, serving and deployment in NVIDIA and open-source LLM frameworks.
Scale performance of LLM models across different architectures and NVIDIA accelerators to achieve maximum throughput, minimum latency, and throughput under latency constraints.
Contribute features and code to NVIDIA and open-source LLM frameworks, inference benchmarking frameworks, TensorRT, and Triton.
Implement LLM inference, serving, and deployment algorithms and optimizations using TensorRT LLM, VLLM, SGLang, Triton, and CUDA kernels.
Work cross-functionally with teams across generative AI, automotive, image understanding, and speech understanding to develop performant solutions.

Requirements

Bachelor’s, Master’s, PhD, or equivalent experience in Computer Engineering, Computer Science, EECS, AI, or related fields.
At least 12 years of relevant software development experience.
Excellent programming and software engineering skills in Python, C and C++.
Experience with a deep learning framework such as PyTorch, JAX or TensorFlow.

Ways to stand out / Preferred Qualifications

Prior experience with an LLM framework or a deep-learning compiler in inference, deployment, algorithms or implementation.
Experience with performance modeling, profiling, debugging, and code optimization of deep learning, HPC or other high-performance applications.
Architectural knowledge of CPUs and GPUs.
GPU programming experience (CUDA or OpenCL).

Compensation & Benefits

Base salary range: 272,000 USD - 425,500 USD (final base determined by location, experience, and internal pay equity).
Eligible for equity and company benefits (link provided in original posting).
NVIDIA is an equal opportunity employer committed to diversity and inclusion.

Other Details

Location: Santa Clara, CA, United States.
Office policy: Hybrid (#LI-Hybrid).
Employment type: Full time.
Applications accepted at least until July 29, 2025.