Senior Software Engineer – TensorRT Edge-LLM

at Nvidia
USD 184,000-287,500 per year
SENIOR
✅ Hybrid

Used Tools & Technologies

GenAI

Required Skills & Competences

Software Development @ 6 C @ 4 C++ @ 6 LLM @ 4 CUDA @ 4 GPU @ 6 Generative AI @ 4 AI @ 4 Profiling @ 6 Robotics @ 4 vLLM @ 3 TensorRT @ 4 SGLang @ 3

Details

Join NVIDIA's TensorRT Edge-LLM team to push the limits of real-time large language model inference on embedded and edge platforms for automotive and robotics. The team builds the software stack enabling LLM, VLM, and multimodal models to run efficiently on-device and deliver generative AI experiences with low latency.

Responsibilities

  • Develop and evolve a state-of-the-art inference framework in modern C++ that extends TensorRT with autoregressive model serving capabilities, including speculative decoding, LoRA, MoE, and KV cache management.
  • Design and implement compiler and runtime optimizations tailored for transformer-based models running on constrained, real-time platforms.
  • Collaborate with teams across CUDA, kernel libraries, compilers, and robotics to deliver high-performance, production-ready solutions.
  • Contribute to CUDA kernel and operator development for transformer components such as attention, GEMM, and MoE.
  • Benchmark, profile, and optimize inference performance across diverse embedded and automotive environments.
  • Stay ahead of the evolving LLM/VLM ecosystem and bring emerging techniques into product-grade software.

Requirements

  • BS, MS, PhD, or equivalent experience in Computer Science, Electrical/Computer Engineering, or a closely related field.
  • 4+ years of relevant software development experience.
  • Deep understanding of transformer models and inference optimization techniques (e.g., quantization, tensor parallelism, memory-efficient scheduling).
  • Proficient programming ability with modern C++ (C++11/14/17 and beyond).
  • Familiarity with LLM frameworks and libraries such as TensorRT, TensorRT-LLM, vLLM, SGLang, MLC-LLM, or FlashInfer.
  • Strong software design, execution, and cross-disciplinary collaboration skills.

Ways to stand out from the crowd

  • Demonstrated development experience or open-source contributions to LLM inference frameworks and libraries (e.g., SGLang, vLLM, FlashInfer).
  • Proficiency with CUDA, including efficient kernel development, performance profiling, and GPU architecture fundamentals.
  • Prior work on autoregressive LLM serving systems, including speculative decoding or KV cache management.
  • Familiarity with compiler infrastructure for large language model inference.
  • Exposure to robotics or embedded AI pipelines, optimizing for low-latency, resource-constrained systems.

Compensation & Benefits

  • Base salary ranges (determined by location and level):
    • Level 3: 152,000 USD - 241,500 USD
    • Level 4: 184,000 USD - 287,500 USD
  • Eligible for equity and benefits. (Link to NVIDIA benefits referenced in the posting.)

Additional information

  • #LI-Hybrid
  • Applications for this job will be accepted at least until March 21, 2026.
  • NVIDIA uses AI tools in its recruiting processes and is an equal opportunity employer.