Senior Deep Learning Software Engineer

at Nvidia
USD 184,000-356,500 per year
SENIOR
βœ… Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Python @ 7 Algorithms @ 7 Machine Learning @ 4 Hiring @ 4 Leadership @ 4 Communication @ 4 Debugging @ 4 LLM @ 4 PyTorch @ 4 CUDA @ 4 GPU @ 4

Details

We are looking for a Senior Deep Learning Software Engineer to design and build an automated inference and deployment solution. You will help define a scalable architecture for deep learning inference with emphasis on ease-of-use and compute efficiency. Your work will span multiple layers of the deployment stack: developing features in high-level frameworks (PyTorch, JAX), designing and implementing a high-performance execution environment, performing low-level GPU optimizations, and developing custom GPU kernels in CUDA and/or Triton.

Responsibilities

  • Define a modular, scalable platform to bridge training and deployment workflows and enable tight integration of deployment tooling with training frameworks such as Megatron and NeMo.
  • Leverage and extend the Torch 2.0 ecosystem (TorchDynamo, torch.export, torch.compile, etc.) to analyze and extract standardized model graph representations from arbitrary torch models for automated deployment.
  • Develop support for inference optimization techniques such as speculative decoding and LoRA.
  • Collaborate with teams across NVIDIA to integrate performant kernel implementations within the automated deployment solution.
  • Analyze and profile GPU kernel-level performance to identify hardware and software optimization opportunities.
  • Continuously innovate on inference performance to ensure NVIDIA's inference software solutions (TensorRT, TRT-LLM, TRT Model Optimizer) maintain and increase market leadership.

Requirements

  • Master's, PhD, or equivalent experience in Computer Science, AI, Applied Math, or a related field.
  • 8+ years of relevant work or research experience in deep learning.
  • Strong proficiency in Python and PyTorch; experience with JAX is also mentioned.
  • Excellent software design skills, including debugging, performance analysis, and test design.
  • Strong algorithms and programming fundamentals.
  • Good written and verbal communication skills; ability to work independently and collaboratively in a fast-paced environment.

Ways to stand out

  • Contributions to PyTorch, JAX, or other machine learning frameworks.
  • Knowledge of GPU architecture and the compilation stack, with the ability to understand and debug end-to-end performance.
  • Familiarity with NVIDIA's deep learning SDKs such as TensorRT.
  • Prior experience writing high-performance GPU kernels for ML workloads using CUDA, CUTLASS, or Triton.

Benefits & Additional Information

  • Base salary range provided (varies by level and location). Equity and additional benefits available.
  • Application deadline: at least until July 29, 2025.
  • Location listed as Santa Clara, CA, US. Role tagged as Hybrid (#LI-Hybrid).
  • Employer emphasizes diversity and equal opportunity hiring.