Senior Deep Learning Software Engineer

at Nvidia

📍 Santa Clara, United States

USD 224,000-356,500 per year

SENIOR

✅ Hybrid

Used Tools & Technologies

Not specified

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Python @ 7 Algorithms @ 7 Machine Learning @ 4 Leadership @ 4 Communication @ 4 Debugging @ 4 LLM @ 4 PyTorch @ 4 CUDA @ 4 GPU @ 4 Deep Learning @ 4 AI @ 4 TensorRT @ 4 Performance Analysis @ 4

Details

We are looking for a Senior Deep Learning Software Engineer to design and build our automated inference and deployment solution. You will help define a scalable architecture for deep learning inference with emphasis on ease-of-use and compute efficiency. Work spans multiple layers of the DL deployment stack: developing features in high-level frameworks (PyTorch, JAX), designing and implementing a high-performance execution environment, low-level GPU optimizations, and developing custom GPU kernels in CUDA and/or Triton.

Responsibilities

Define and implement a modular, scalable platform to bridge training and deployment workflows and enable tight integration of deployment tooling with training frameworks such as Megatron and NeMo.
Leverage and extend the Torch 2.0 ecosystem (TorchDynamo, torch.export, torch.compile, etc.) to analyze and extract standardized model graph representations from arbitrary torch models for automated deployment.
Develop support for inference optimization techniques such as speculative decoding and LoRA.
Collaborate with teams across NVIDIA to integrate performant kernel implementations into the automated deployment solution.
Analyze and profile GPU kernel-level performance to identify hardware and software optimization opportunities.
Continuously innovate on inference performance to ensure NVIDIA's inference software solutions (TensorRT, TRT-LLM, TRT Model Optimizer) maintain and increase market leadership.

Requirements

Master's, PhD, or equivalent experience in Computer Science, AI, Applied Math, or related field.
8+ years of relevant work or research experience in deep learning.
Excellent software design skills, including debugging, performance analysis, and test design.
Strong proficiency in Python, PyTorch, and related ML tools.
Strong algorithms and programming fundamentals.
Good written and verbal communication skills and the ability to work independently and collaboratively in a fast-paced environment.

Ways to stand out

Contributions to PyTorch, JAX, or other machine learning frameworks.
Knowledge of GPU architecture and compilation stack, and capability of understanding and debugging end-to-end performance.
Familiarity with NVIDIA's deep learning SDKs such as TensorRT.
Prior experience writing high-performance GPU kernels for ML workloads in CUDA, CUTLASS, or Triton.

Compensation & Benefits

Base salary range: 224,000 USD - 356,500 USD (final base salary determined by location, experience, and pay of employees in similar positions).
Eligible for equity and company benefits (link provided in original posting).

Additional information

Location: Santa Clara, California, United States. #LI-Hybrid
Applications accepted at least until February 3, 2026.
NVIDIA uses AI tools in its recruiting processes and is an equal opportunity employer.