Senior Deep Learning Software Engineer

at Nvidia

📍 Santa Clara, United States

USD 184,000-356,500 per year

SENIOR

✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Python @ 7 Algorithms @ 7 Machine Learning @ 4 Hiring @ 4 Leadership @ 4 Communication @ 4 Debugging @ 4 LLM @ 4 PyTorch @ 4 CUDA @ 4 GPU @ 4

Details

We are looking for a Senior Deep Learning Software Engineer to design and build an automated inference and deployment solution. You will help define a scalable architecture for deep learning inference with emphasis on ease-of-use and compute efficiency. Your work will span multiple layers of the deployment stack: developing features in high-level frameworks (PyTorch, JAX), designing and implementing a high-performance execution environment, performing low-level GPU optimizations, and developing custom GPU kernels in CUDA and/or Triton.

Responsibilities

Define a modular, scalable platform to bridge training and deployment workflows and enable tight integration of deployment tooling with training frameworks such as Megatron and NeMo.
Leverage and extend the Torch 2.0 ecosystem (TorchDynamo, torch.export, torch.compile, etc.) to analyze and extract standardized model graph representations from arbitrary torch models for automated deployment.
Develop support for inference optimization techniques such as speculative decoding and LoRA.
Collaborate with teams across NVIDIA to integrate performant kernel implementations within the automated deployment solution.
Analyze and profile GPU kernel-level performance to identify hardware and software optimization opportunities.
Continuously innovate on inference performance to ensure NVIDIA's inference software solutions (TensorRT, TRT-LLM, TRT Model Optimizer) maintain and increase market leadership.

Requirements

Master's, PhD, or equivalent experience in Computer Science, AI, Applied Math, or a related field.
8+ years of relevant work or research experience in deep learning.
Strong proficiency in Python and PyTorch; experience with JAX is also mentioned.
Excellent software design skills, including debugging, performance analysis, and test design.
Strong algorithms and programming fundamentals.
Good written and verbal communication skills; ability to work independently and collaboratively in a fast-paced environment.

Ways to stand out

Contributions to PyTorch, JAX, or other machine learning frameworks.
Knowledge of GPU architecture and the compilation stack, with the ability to understand and debug end-to-end performance.
Familiarity with NVIDIA's deep learning SDKs such as TensorRT.
Prior experience writing high-performance GPU kernels for ML workloads using CUDA, CUTLASS, or Triton.

Benefits & Additional Information

Base salary range provided (varies by level and location). Equity and additional benefits available.
Application deadline: at least until July 29, 2025.
Location listed as Santa Clara, CA, US. Role tagged as Hybrid (#LI-Hybrid).
Employer emphasizes diversity and equal opportunity hiring.