Deep Learning Software Engineer, TensorRT Performance - New College Grad 2026

at Nvidia

📍 Santa Clara, United States

USD 124,000-241,500 per year

JUNIOR

✅ On-site

Used Tools & Technologies

GenAI

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Software Development @ 3 Python @ 6 TensorFlow @ 3 Performance Optimization @ 3 OSS @ 3 LLM @ 3 PyTorch @ 3 CUDA @ 5 GPU @ 3 Deep Learning @ 3 Generative AI @ 3 AI @ 3 Robotics @ 3 vLLM @ 3 TensorRT @ 3 SGLang @ 3 Performance Analysis @ 3 JAX @ 3

Details

We are now looking for a Deep Learning Software Engineer, TensorRT Performance. NVIDIA is seeking an experienced Deep Learning Engineer passionate about analyzing and improving the performance of NVIDIA’s inference ecosystem. The team develops GPU-accelerated deep learning inference software such as TensorRT, DL benchmarking software and performant solutions to deploy and serve models across datacenter GPUs and edge SoCs.

Responsibilities

Establish performance benchmarking methodologies and analysis workflows and identify performance issues and opportunities for NVIDIA’s inference ecosystem (e.g., TensorRT, TensorRT-EdgeLLM, Torch-TensorRT).
Contribute features and code to NVIDIA/OSS inference frameworks including but not limited to TensorRT, TensorRT-EdgeLLM, and Torch-TensorRT.
Develop model pipelines for NVIDIA’s inference ecosystem focused on optimized performance in areas such as quantization, scheduling, memory management, and distributed inference.
Collaborate with cross-functional teams across generative AI, automotive, robotics, image understanding, and speech understanding to set directions and develop inference solutions.
Scale performance of deep learning models across different architectures and types of NVIDIA accelerators.

Requirements

Bachelors, Masters, PhD, or equivalent experience in Computer Science, Computer Engineering, EECS, AI, or relevant field.
2 years of relevant software development experience.
Strong C++ and Python programming and software engineering skills.
Experience with deep learning frameworks (examples: PyTorch, JAX, TensorFlow, ONNX) and inference libraries (examples: TensorRT, TensorRT-LLM, vLLM, SGLang, FlashInfer).
Experience with performance analysis and performance optimization.

Ways to stand out

Strong foundation and architectural knowledge of GPUs.
Deep understanding of modern deep learning models and workloads (e.g., Transformers, Recommenders, ASR, TTS, Visual Understanding).
Proficiency in one of the deep learning programming domain specific languages (examples: CUDA, TileIR, CuTeDSL, cutlass, Triton).
Prior contributions to major LLM inference frameworks (e.g., vLLM) or experience with graph compilers in deep learning inference (e.g., TorchDynamo, TorchInductor).
Prior experience optimizing performance for low-latency, resource-constrained systems or embedded AI pipelines (e.g., Jetson systems or other edge AI accelerators).

Compensation & Benefits

Base salary ranges (determined by location, experience, and pay of employees in similar positions):
- Level 2: 124,000 USD - 195,500 USD
- Level 3: 152,000 USD - 241,500 USD
You will also be eligible for equity and benefits.

Additional information

Applications for this job will be accepted at least until April 7, 2026.
This posting is for an existing vacancy.
NVIDIA uses AI tools in its recruiting processes.
NVIDIA is an equal opportunity employer and committed to fostering a diverse work environment.