Senior Deep Learning Software Engineer, TensorRT Performance

at Nvidia

📍 Santa Clara, United States

USD 152,000-287,500 per year

SENIOR

✅ Hybrid

Used Tools & Technologies

GenAI

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Software Development @ 6 Python @ 7 Algorithms @ 4 TensorFlow @ 4 Performance Optimization @ 4 LLM @ 4 PyTorch @ 4 CUDA @ 6 GPU @ 4 Deep Learning @ 4 Generative AI @ 4 AI @ 4 Robotics @ 4 vLLM @ 4 TensorRT @ 4 SGLang @ 4 Performance Analysis @ 4

Details

We are seeking an experienced Deep Learning Engineer to analyze and improve the performance of NVIDIA's inference ecosystem. The team develops GPU-accelerated deep learning inference software such as TensorRT, DL benchmarking software, and performant solutions for deployment and serving of models across datacenter GPUs and edge SoCs.

Responsibilities

Establish performance benchmarking methodologies and analysis workflows; identify performance issues and opportunities across NVIDIA's inference ecosystem (e.g., TensorRT, TensorRT-EdgeLLM, Torch-TensorRT).
Contribute features and code to NVIDIA and open-source inference frameworks (including but not limited to TensorRT, TensorRT-EdgeLLM, Torch-TensorRT).
Develop model pipelines optimized for inference performance, including work in quantization, scheduling, memory management, and distributed inference.
Implement graph compiler algorithms, frontend operators, and code generators across the inference stack.
Perform performance modeling, analysis, and kernel development to scale model performance across different NVIDIA accelerator architectures.
Collaborate with cross-functional teams across generative AI, automotive, robotics, image understanding, and speech domains.

Requirements

Bachelors, Masters, PhD, or equivalent experience in Computer Science, Computer Engineering, EECS, AI or related fields.
At least 3 years of relevant software development experience.
Strong C++ and Python programming and software engineering skills.
Experience with deep learning frameworks (examples given: PyTorch, JAX, TensorFlow, ONNX).
Experience with inference libraries (examples given: TensorRT, TensorRT-LLM, vLLM, SGLang, FlashInfer).
Experience with performance analysis and performance optimization.

Ways to stand out

Strong foundation and architectural knowledge of GPUs.
Deep understanding of modern deep learning models and workloads (e.g., Transformers, Recommenders, ASR, TTS, Visual Understanding).
Proficiency in deep learning programming DSLs and GPU programming (examples given: CUDA, TileIR, CuTeDSL, cutlass, Triton).
Prior contributions to major LLM inference frameworks (e.g., vLLM) or experience with graph compilers for deep learning inference (e.g., TorchDynamo, TorchInductor).
Prior experience optimizing for low-latency, resource-constrained systems or embedded AI pipelines (e.g., Jetson or other edge AI accelerators).

Compensation & Benefits

Base salary range is 152,000 USD - 241,500 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4.
You will also be eligible for equity and benefits.

Other information

Location: Santa Clara, CA, United States.
Time type: Full time. Posted as #LI-Hybrid.
Applications accepted at least until March 26, 2026.
NVIDIA uses AI tools in its recruiting processes. NVIDIA is an equal opportunity employer.