Principal Software Engineer, TensorRT-LLM

at Nvidia
USD 272,000-425,500 per year
SENIOR
✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Software Development @ 8 Python @ 4 Communication @ 4 Mathematics @ 4 Debugging @ 4 LLM @ 4 PyTorch @ 4 GPU @ 4

Details

We are looking for a Principal Software Engineer to join the TensorRT-LLM team building AI inferencing software for GPU-accelerated deep learning platforms. The role focuses on architecting robust inferencing systems, optimizing performance, and collaborating across software, research, and product teams to drive AI inferencing direction.

Responsibilities

  • Architect and guide development of robust inference software that can be scaled to multiple platforms for functionality and performance
  • Perform performance analysis, optimization, and tuning of inference systems
  • Follow developments in AI and evolve code design to keep pace with advances (LLMs, GenAI)
  • Collaborate across the company with software, research, and product teams to guide AI inferencing direction

Requirements

  • Bachelors, Masters, or higher in Computer Engineering, Computer Science, Applied Mathematics, or related computing-focused degree (or equivalent experience)
  • 15+ years of relevant software development experience and 2+ years in an architect/tech lead role
  • Excellent Python or C/C++ programming and software design skills, including debugging, performance analysis, and test design
  • Strong understanding of GenAI serving and awareness of the latest developments in deep learning such as large language models (LLMs)
  • Experience with LLM inference frameworks (example: vLLM, SGLang)
  • Experience with deep learning frameworks such as PyTorch, JAX
  • Excellent written and oral communication skills in English

Benefits

  • Base salary range: 272,000 USD - 425,500 USD (determined based on location, experience, and comparable employees)
  • Eligible for equity and company benefits
  • NVIDIA is an equal opportunity employer and values diversity; applications accepted at least until July 29, 2025

Additional Information

  • Role type: Full time
  • Office policy: Hybrid (#LI-Hybrid)
  • Location provided: US, CA, Santa Clara
  • Exposure to the entire deep learning software stack and GPU-accelerated DL platform development is expected.