Used Tools & Technologies
GenAIRequired Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Software Development @ 6
C @ 4
C++ @ 6
LLM @ 4
CUDA @ 4
GPU @ 6
Generative AI @ 4
AI @ 4
Profiling @ 6
Robotics @ 4
vLLM @ 3
TensorRT @ 4
SGLang @ 3
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
Join NVIDIA's TensorRT Edge-LLM team to push the limits of real-time large language model inference on embedded and edge platforms for automotive and robotics. The team builds the software stack enabling LLM, VLM, and multimodal models to run efficiently on-device and deliver generative AI experiences with low latency.
Responsibilities
- Develop and evolve a state-of-the-art inference framework in modern C++ that extends TensorRT with autoregressive model serving capabilities, including speculative decoding, LoRA, MoE, and KV cache management.
- Design and implement compiler and runtime optimizations tailored for transformer-based models running on constrained, real-time platforms.
- Collaborate with teams across CUDA, kernel libraries, compilers, and robotics to deliver high-performance, production-ready solutions.
- Contribute to CUDA kernel and operator development for transformer components such as attention, GEMM, and MoE.
- Benchmark, profile, and optimize inference performance across diverse embedded and automotive environments.
- Stay ahead of the evolving LLM/VLM ecosystem and bring emerging techniques into product-grade software.
Requirements
- BS, MS, PhD, or equivalent experience in Computer Science, Electrical/Computer Engineering, or a closely related field.
- 4+ years of relevant software development experience.
- Deep understanding of transformer models and inference optimization techniques (e.g., quantization, tensor parallelism, memory-efficient scheduling).
- Proficient programming ability with modern C++ (C++11/14/17 and beyond).
- Familiarity with LLM frameworks and libraries such as TensorRT, TensorRT-LLM, vLLM, SGLang, MLC-LLM, or FlashInfer.
- Strong software design, execution, and cross-disciplinary collaboration skills.
Ways to stand out from the crowd
- Demonstrated development experience or open-source contributions to LLM inference frameworks and libraries (e.g., SGLang, vLLM, FlashInfer).
- Proficiency with CUDA, including efficient kernel development, performance profiling, and GPU architecture fundamentals.
- Prior work on autoregressive LLM serving systems, including speculative decoding or KV cache management.
- Familiarity with compiler infrastructure for large language model inference.
- Exposure to robotics or embedded AI pipelines, optimizing for low-latency, resource-constrained systems.
Compensation & Benefits
- Base salary ranges (determined by location and level):
- Level 3: 152,000 USD - 241,500 USD
- Level 4: 184,000 USD - 287,500 USD
- Eligible for equity and benefits. (Link to NVIDIA benefits referenced in the posting.)
Additional information
- #LI-Hybrid
- Applications for this job will be accepted at least until March 21, 2026.
- NVIDIA uses AI tools in its recruiting processes and is an equal opportunity employer.