Senior Deep Learning Software Engineer, PyTorch - TensorRT Performance

at Nvidia

📍 Santa Clara, United States

USD 148,000-287,500 per year

SENIOR

✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Software Development @ 4 Python @ 4 Algorithms @ 4 TensorFlow @ 4 Performance Optimization @ 4 Debugging @ 4 PyTorch @ 4 CUDA @ 6 GPU @ 4

Details

We are seeking a Senior Deep Learning Software Engineer focused on PyTorch - TensorRT performance. The role is part of NVIDIA's Deep Learning Inference research and development team, working on GPU-accelerated deep learning software such as TensorRT, Torch-TensorRT, DL benchmarking software, and performant deployment/serving solutions.

Responsibilities

Analyze performance issues and identify optimization opportunities inside Torch-TensorRT and TensorRT.
Contribute features and code to NVIDIA and open-source inference frameworks (including Torch-TensorRT, TensorRT, and PyTorch).
Implement graph compiler algorithms, frontend operators, and code generators across the PyTorch, Torch-TensorRT, and TensorRT software stack.
Work with cross-collaborative teams across generative AI, automotive, robotics, image understanding, and speech understanding to develop inference solutions.
Scale performance of deep learning models across different architectures and NVIDIA accelerators (datacenter GPUs to edge SoCs).
Collaborate with teams on workflow improvements, performance modeling, performance analysis, kernel development, and inference software development.

Requirements

Bachelor's, Master's, PhD, or equivalent experience in Computer Science, Computer Engineering, EECS, AI, or related field.
At least 4 years of relevant software development experience.
Excellent Python and C++ programming, software design, and software engineering skills.
Experience with a deep learning framework such as PyTorch, JAX, or TensorFlow.
Experience with performance analysis and performance optimization.
Experience working with or contributing to inference frameworks (Torch-TensorRT, TensorRT, PyTorch) is expected.

Preferred / Ways to stand out

Architectural knowledge of GPUs.
Prior experience with AoT or JIT compilers in deep learning inference (e.g., TorchDynamo, TorchInductor).
Prior experience with performance modeling, profiling, debugging, and code optimization of DL/HPC/high-performance applications.
GPU programming experience and proficiency in a GPU DSL or libraries such as CUDA, TileIR, CuTeDSL, cutlass, or Triton.

Benefits & Compensation

Base salary ranges provided by level: Level 3: 148,000 USD - 235,750 USD; Level 4: 184,000 USD - 287,500 USD (final base depends on location, experience, and pay of employees in similar positions).
Eligible for equity and company benefits (see NVIDIA benefits page).
Applications accepted at least until December 15, 2025.

About NVIDIA

NVIDIA builds software to enable the performance optimization, deployment, and serving of deep learning solutions used across generative AI, recommenders, and vision. The company emphasizes diversity and is an equal opportunity employer.