Senior DL Software Engineer, Model Optimization and Edge Deployment - Autonomous Vehicles

at Nvidia
USD 184,000-356,500 per year
SENIOR
✅ On-site

Used Tools & Technologies

Not specified

Required Skills & Competences

Algorithms @ 4 Machine Learning @ 6 LLM @ 4 PyTorch @ 4 CUDA @ 3 GPU @ 7 Deep Learning @ 4 AI @ 7 Profiling @ 4 Robotics @ 4 vLLM @ 6 TensorRT @ 4 SGLang @ 6 JAX @ 6

Details

NVIDIA is seeking a high-caliber Deep Learning Engineer to bridge cutting-edge multimodal architectures and real-time robotic execution for autonomous vehicles. In this role you will design and implement state-of-the-art algorithms to make LLM/VLM models fast, lean, and reliable enough to power an end-to-end driving stack. You will re-architect models for the edge to meet strict latency and safety constraints of an AV compute platform and integrate large-scale models within a high-performance C++ production environment.

Responsibilities

  • Develop state-of-the-art model optimization techniques (examples: speculative decoding with block diffusion, KV cache streaming, Prefill–Decode separation) to boost end-to-end model performance for production deployments.
  • Implement advanced compression techniques including quantization (FP4/FP8), pruning, and knowledge distillation to minimize model footprints while preserving safety-critical accuracy.
  • Design high-performance inference optimizations, including automated model sharding (tensor/sequence parallelism) and efficient attention kernels optimized for KV-caching.
  • Conduct deep, layer-by-layer model profiling to identify compute and memory bottlenecks and drive targeted optimizations for real-time execution.
  • Leverage the PyTorch ecosystem to extract standardized model graph representations and automate deployment pipelines for TensorRT conversion.
  • Scale deep learning model performance across NVIDIA edge architectures to maximize throughput of specialized accelerators on-road.
  • Architect software interfaces to integrate and interact with large-scale models within a high-performance C++ production environment.
  • Partner with research, TensorRT, and Cosmos teams to translate innovations into shipping product solutions.

Requirements

  • PhD with 4+ years, MS with 6+ years, or BS (or equivalent experience) with 8+ years of relevant experience in Computer Science, Computer Engineering, or a related technical field.
  • Expert-level proficiency in PyTorch, JAX, or similar machine learning frameworks.
  • Sophisticated proficiency with modern LLM/VLM inference stacks such as vLLM, TensorRT-LLM and SGLang.
  • Proven track record of training, deploying, or optimizing large-scale deep learning models in production environments.
  • Deep familiarity with NVIDIA deep learning SDKs, specifically TensorRT and CUDA.
  • Strong understanding of GPU architecture and the compilation stack, with the ability to debug end-to-end performance across the hardware/software boundary.

Ways to Stand Out

  • Deep experience with LLM, VLM, and VLA model optimization tailored for real-time robotic control, embodied AI, and autonomous decision-making.
  • Proven track record of implementing low-bit inference.
  • Prior experience writing custom high-performance kernels using CUDA, Triton, or CUTLASS to accelerate non-standard layers and specialized attention mechanisms.
  • Active contributions to open-source inference and optimization libraries such as vLLM, SGLang, and TensorRT-LLM.
  • Thorough understanding of real-time robotics constraints including safety-critical determinism, hardware-in-the-loop (HIL) testing, and ultra-low latency requirements.

Compensation & Benefits

  • Base salary range:
    • Level 4: 184,000 USD - 287,500 USD
    • Level 5: 224,000 USD - 356,500 USD
  • Eligible for equity and benefits. (Link to NVIDIA benefits referenced in original posting.)

Additional Information

  • Applications accepted at least until April 25, 2026.
  • This posting is for an existing vacancy.
  • NVIDIA uses AI tools in its recruiting processes and is an equal opportunity employer committed to a diverse work environment.