Principal Software Engineer, TensorRT-LLM

at Nvidia

📍 Santa Clara, United States

USD 272,000-425,500 per year

SENIOR

✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Software Development @ 8 Python @ 4 Communication @ 4 Mathematics @ 4 Debugging @ 4 LLM @ 4 PyTorch @ 4 GPU @ 4

Details

We are looking for a Principal Software Engineer to join the TensorRT-LLM team building AI inferencing software for GPU-accelerated deep learning platforms. The role focuses on architecting robust inferencing systems, optimizing performance, and collaborating across software, research, and product teams to drive AI inferencing direction.

Responsibilities

Architect and guide development of robust inference software that can be scaled to multiple platforms for functionality and performance
Perform performance analysis, optimization, and tuning of inference systems
Follow developments in AI and evolve code design to keep pace with advances (LLMs, GenAI)
Collaborate across the company with software, research, and product teams to guide AI inferencing direction

Requirements

Bachelors, Masters, or higher in Computer Engineering, Computer Science, Applied Mathematics, or related computing-focused degree (or equivalent experience)
15+ years of relevant software development experience and 2+ years in an architect/tech lead role
Excellent Python or C/C++ programming and software design skills, including debugging, performance analysis, and test design
Strong understanding of GenAI serving and awareness of the latest developments in deep learning such as large language models (LLMs)
Experience with LLM inference frameworks (example: vLLM, SGLang)
Experience with deep learning frameworks such as PyTorch, JAX
Excellent written and oral communication skills in English

Benefits

Base salary range: 272,000 USD - 425,500 USD (determined based on location, experience, and comparable employees)
Eligible for equity and company benefits
NVIDIA is an equal opportunity employer and values diversity; applications accepted at least until July 29, 2025

Additional Information

Role type: Full time
Office policy: Hybrid (#LI-Hybrid)
Location provided: US, CA, Santa Clara
Exposure to the entire deep learning software stack and GPU-accelerated DL platform development is expected.