Deep Learning Software Engineer, TensorRT Performance - New College Grad 2026

at Nvidia
USD 124,000-241,500 per year
JUNIOR
✅ On-site

Used Tools & Technologies

GenAI

Required Skills & Competences

Software Development @ 3 Python @ 6 TensorFlow @ 3 Performance Optimization @ 3 OSS @ 3 LLM @ 3 PyTorch @ 3 CUDA @ 5 GPU @ 3 Deep Learning @ 3 Generative AI @ 3 AI @ 3 Robotics @ 3 vLLM @ 3 TensorRT @ 3 SGLang @ 3 Performance Analysis @ 3 JAX @ 3

Details

We are now looking for a Deep Learning Software Engineer, TensorRT Performance. NVIDIA is seeking an experienced Deep Learning Engineer passionate about analyzing and improving the performance of NVIDIA’s inference ecosystem. The team develops GPU-accelerated deep learning inference software such as TensorRT, DL benchmarking software and performant solutions to deploy and serve models across datacenter GPUs and edge SoCs.

Responsibilities

  • Establish performance benchmarking methodologies and analysis workflows and identify performance issues and opportunities for NVIDIA’s inference ecosystem (e.g., TensorRT, TensorRT-EdgeLLM, Torch-TensorRT).
  • Contribute features and code to NVIDIA/OSS inference frameworks including but not limited to TensorRT, TensorRT-EdgeLLM, and Torch-TensorRT.
  • Develop model pipelines for NVIDIA’s inference ecosystem focused on optimized performance in areas such as quantization, scheduling, memory management, and distributed inference.
  • Collaborate with cross-functional teams across generative AI, automotive, robotics, image understanding, and speech understanding to set directions and develop inference solutions.
  • Scale performance of deep learning models across different architectures and types of NVIDIA accelerators.

Requirements

  • Bachelors, Masters, PhD, or equivalent experience in Computer Science, Computer Engineering, EECS, AI, or relevant field.
  • 2 years of relevant software development experience.
  • Strong C++ and Python programming and software engineering skills.
  • Experience with deep learning frameworks (examples: PyTorch, JAX, TensorFlow, ONNX) and inference libraries (examples: TensorRT, TensorRT-LLM, vLLM, SGLang, FlashInfer).
  • Experience with performance analysis and performance optimization.

Ways to stand out

  • Strong foundation and architectural knowledge of GPUs.
  • Deep understanding of modern deep learning models and workloads (e.g., Transformers, Recommenders, ASR, TTS, Visual Understanding).
  • Proficiency in one of the deep learning programming domain specific languages (examples: CUDA, TileIR, CuTeDSL, cutlass, Triton).
  • Prior contributions to major LLM inference frameworks (e.g., vLLM) or experience with graph compilers in deep learning inference (e.g., TorchDynamo, TorchInductor).
  • Prior experience optimizing performance for low-latency, resource-constrained systems or embedded AI pipelines (e.g., Jetson systems or other edge AI accelerators).

Compensation & Benefits

  • Base salary ranges (determined by location, experience, and pay of employees in similar positions):
    • Level 2: 124,000 USD - 195,500 USD
    • Level 3: 152,000 USD - 241,500 USD
  • You will also be eligible for equity and benefits.

Additional information

  • Applications for this job will be accepted at least until April 7, 2026.
  • This posting is for an existing vacancy.
  • NVIDIA uses AI tools in its recruiting processes.
  • NVIDIA is an equal opportunity employer and committed to fostering a diverse work environment.