Senior Deep Learning Software Engineer, TensorRT Performance

at Nvidia
USD 152,000-287,500 per year
SENIOR
✅ Hybrid

Used Tools & Technologies

GenAI

Required Skills & Competences

Software Development @ 6 Python @ 7 Algorithms @ 4 TensorFlow @ 4 Performance Optimization @ 4 LLM @ 4 PyTorch @ 4 CUDA @ 6 GPU @ 4 Deep Learning @ 4 Generative AI @ 4 AI @ 4 Robotics @ 4 vLLM @ 4 TensorRT @ 4 SGLang @ 4 Performance Analysis @ 4

Details

We are seeking an experienced Deep Learning Engineer to analyze and improve the performance of NVIDIA's inference ecosystem. The team develops GPU-accelerated deep learning inference software such as TensorRT, DL benchmarking software, and performant solutions for deployment and serving of models across datacenter GPUs and edge SoCs.

Responsibilities

  • Establish performance benchmarking methodologies and analysis workflows; identify performance issues and opportunities across NVIDIA's inference ecosystem (e.g., TensorRT, TensorRT-EdgeLLM, Torch-TensorRT).
  • Contribute features and code to NVIDIA and open-source inference frameworks (including but not limited to TensorRT, TensorRT-EdgeLLM, Torch-TensorRT).
  • Develop model pipelines optimized for inference performance, including work in quantization, scheduling, memory management, and distributed inference.
  • Implement graph compiler algorithms, frontend operators, and code generators across the inference stack.
  • Perform performance modeling, analysis, and kernel development to scale model performance across different NVIDIA accelerator architectures.
  • Collaborate with cross-functional teams across generative AI, automotive, robotics, image understanding, and speech domains.

Requirements

  • Bachelors, Masters, PhD, or equivalent experience in Computer Science, Computer Engineering, EECS, AI or related fields.
  • At least 3 years of relevant software development experience.
  • Strong C++ and Python programming and software engineering skills.
  • Experience with deep learning frameworks (examples given: PyTorch, JAX, TensorFlow, ONNX).
  • Experience with inference libraries (examples given: TensorRT, TensorRT-LLM, vLLM, SGLang, FlashInfer).
  • Experience with performance analysis and performance optimization.

Ways to stand out

  • Strong foundation and architectural knowledge of GPUs.
  • Deep understanding of modern deep learning models and workloads (e.g., Transformers, Recommenders, ASR, TTS, Visual Understanding).
  • Proficiency in deep learning programming DSLs and GPU programming (examples given: CUDA, TileIR, CuTeDSL, cutlass, Triton).
  • Prior contributions to major LLM inference frameworks (e.g., vLLM) or experience with graph compilers for deep learning inference (e.g., TorchDynamo, TorchInductor).
  • Prior experience optimizing for low-latency, resource-constrained systems or embedded AI pipelines (e.g., Jetson or other edge AI accelerators).

Compensation & Benefits

  • Base salary range is 152,000 USD - 241,500 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4.
  • You will also be eligible for equity and benefits.

Other information

  • Location: Santa Clara, CA, United States.
  • Time type: Full time. Posted as #LI-Hybrid.
  • Applications accepted at least until March 26, 2026.
  • NVIDIA uses AI tools in its recruiting processes. NVIDIA is an equal opportunity employer.