Used Tools & Technologies
Not specified
Required Skills & Competences ?
Software Development @ 8 Python @ 4 Communication @ 4 Mathematics @ 4 Debugging @ 4 LLM @ 4 PyTorch @ 4 GPU @ 4Details
We are looking for a Principal Software Engineer to join the TensorRT-LLM team building AI inferencing software for GPU-accelerated deep learning platforms. The role focuses on architecting robust inferencing systems, optimizing performance, and collaborating across software, research, and product teams to drive AI inferencing direction.
Responsibilities
- Architect and guide development of robust inference software that can be scaled to multiple platforms for functionality and performance
- Perform performance analysis, optimization, and tuning of inference systems
- Follow developments in AI and evolve code design to keep pace with advances (LLMs, GenAI)
- Collaborate across the company with software, research, and product teams to guide AI inferencing direction
Requirements
- Bachelors, Masters, or higher in Computer Engineering, Computer Science, Applied Mathematics, or related computing-focused degree (or equivalent experience)
- 15+ years of relevant software development experience and 2+ years in an architect/tech lead role
- Excellent Python or C/C++ programming and software design skills, including debugging, performance analysis, and test design
- Strong understanding of GenAI serving and awareness of the latest developments in deep learning such as large language models (LLMs)
- Experience with LLM inference frameworks (example: vLLM, SGLang)
- Experience with deep learning frameworks such as PyTorch, JAX
- Excellent written and oral communication skills in English
Benefits
- Base salary range: 272,000 USD - 425,500 USD (determined based on location, experience, and comparable employees)
- Eligible for equity and company benefits
- NVIDIA is an equal opportunity employer and values diversity; applications accepted at least until July 29, 2025
Additional Information
- Role type: Full time
- Office policy: Hybrid (#LI-Hybrid)
- Location provided: US, CA, Santa Clara
- Exposure to the entire deep learning software stack and GPU-accelerated DL platform development is expected.