Senior Software Engineer – TensorRT Edge-LLM

at Nvidia

📍 Santa Clara, United States

USD 184,000-287,500 per year

SENIOR

✅ Hybrid

Used Tools & Technologies

GenAI

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Software Development @ 6 C @ 4 C++ @ 6 LLM @ 4 CUDA @ 4 GPU @ 6 Generative AI @ 4 AI @ 4 Profiling @ 6 Robotics @ 4 vLLM @ 3 TensorRT @ 4 SGLang @ 3

Details

Join NVIDIA's TensorRT Edge-LLM team to push the limits of real-time large language model inference on embedded and edge platforms for automotive and robotics. The team builds the software stack enabling LLM, VLM, and multimodal models to run efficiently on-device and deliver generative AI experiences with low latency.

Responsibilities

Develop and evolve a state-of-the-art inference framework in modern C++ that extends TensorRT with autoregressive model serving capabilities, including speculative decoding, LoRA, MoE, and KV cache management.
Design and implement compiler and runtime optimizations tailored for transformer-based models running on constrained, real-time platforms.
Collaborate with teams across CUDA, kernel libraries, compilers, and robotics to deliver high-performance, production-ready solutions.
Contribute to CUDA kernel and operator development for transformer components such as attention, GEMM, and MoE.
Benchmark, profile, and optimize inference performance across diverse embedded and automotive environments.
Stay ahead of the evolving LLM/VLM ecosystem and bring emerging techniques into product-grade software.

Requirements

BS, MS, PhD, or equivalent experience in Computer Science, Electrical/Computer Engineering, or a closely related field.
4+ years of relevant software development experience.
Deep understanding of transformer models and inference optimization techniques (e.g., quantization, tensor parallelism, memory-efficient scheduling).
Proficient programming ability with modern C++ (C++11/14/17 and beyond).
Familiarity with LLM frameworks and libraries such as TensorRT, TensorRT-LLM, vLLM, SGLang, MLC-LLM, or FlashInfer.
Strong software design, execution, and cross-disciplinary collaboration skills.

Ways to stand out from the crowd

Demonstrated development experience or open-source contributions to LLM inference frameworks and libraries (e.g., SGLang, vLLM, FlashInfer).
Proficiency with CUDA, including efficient kernel development, performance profiling, and GPU architecture fundamentals.
Prior work on autoregressive LLM serving systems, including speculative decoding or KV cache management.
Familiarity with compiler infrastructure for large language model inference.
Exposure to robotics or embedded AI pipelines, optimizing for low-latency, resource-constrained systems.

Compensation & Benefits

Base salary ranges (determined by location and level):
- Level 3: 152,000 USD - 241,500 USD
- Level 4: 184,000 USD - 287,500 USD
Eligible for equity and benefits. (Link to NVIDIA benefits referenced in the posting.)

Additional information

#LI-Hybrid
Applications for this job will be accepted at least until March 21, 2026.
NVIDIA uses AI tools in its recruiting processes and is an equal opportunity employer.