Senior Deep Learning Software Engineer
SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
Python @ 7 Algorithms @ 7 Machine Learning @ 4 Leadership @ 4 Communication @ 4 Debugging @ 4 LLM @ 4 PyTorch @ 4 CUDA @ 4 GPU @ 4Details
We are looking for a Senior Deep Learning Software Engineer to design and build an automated inference and deployment solution. The role focuses on defining a scalable architecture for deep learning (DL) inference emphasizing ease-of-use and compute efficiency. Work spans multiple layers of the DL deployment stack: high-level framework features (PyTorch, JAX), high-performance execution environment, low-level GPU optimizations and custom GPU kernels (CUDA and/or Triton). This position sits at the intersection of research and engineering and requires strong ML fundamentals and software architecture & engineering skills.
Responsibilities
- Define and implement a modular, scalable platform to bridge training and deployment workflows and enable tight integration of deployment tooling with training frameworks (e.g., Megatron, NeMo).
- Leverage and extend the Torch 2.0 ecosystem (TorchDynamo, torch.export, torch.compile, etc.) to analyze and extract standardized model graph representations from arbitrary torch models for automated deployment.
- Develop support for inference optimization techniques such as speculative decoding and LoRA.
- Collaborate with teams across NVIDIA to integrate performant kernel implementations into the automated deployment solution.
- Analyze and profile GPU kernel-level performance to identify hardware and software optimization opportunities.
- Continuously innovate to improve inference performance and help NVIDIA's inference software solutions (TensorRT, TRT-LLM, TRT Model Optimizer) maintain market leadership.
Requirements
- Masters, PhD, or equivalent experience in Computer Science, AI, Applied Math, or a related field.
- 8+ years of relevant work or research experience in Deep Learning.
- Excellent software design skills, including debugging, performance analysis, and test design.
- Strong proficiency in Python, PyTorch, and related ML tools.
- Strong algorithms and programming fundamentals.
- Good written and verbal communication skills and ability to work independently and collaboratively in a fast-paced environment.
Ways to stand out
- Contributions to PyTorch, JAX, or other machine learning frameworks.
- Knowledge of GPU architecture and compilation stacks, and ability to understand and debug end-to-end performance.
- Familiarity with NVIDIA deep learning SDKs such as TensorRT.
- Prior experience writing high-performance GPU kernels for ML workloads using CUDA, CUTLASS, or Triton.
Compensation & Benefits
- Base salary ranges (location- and level-dependent):
- Level 4: 184,000 USD - 287,500 USD per year
- Level 5: 224,000 USD - 356,500 USD per year
- Eligible for equity and company benefits.
Location & Other Details
- Location: Santa Clara, CA, United States. #LI-Hybrid (hybrid work arrangement).
- Employment type: Full time.
- Applications accepted at least until July 29, 2025.
NVIDIA is committed to fostering a diverse work environment and is an equal opportunity employer. We do not discriminate based on legally protected characteristics.