Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Python @ 7
Algorithms @ 7
Machine Learning @ 4
Leadership @ 4
Communication @ 4
Debugging @ 4
LLM @ 4
PyTorch @ 4
CUDA @ 4
GPU @ 4
Deep Learning @ 4
AI @ 4
TensorRT @ 4
Performance Analysis @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
We are looking for a Senior Deep Learning Software Engineer to design and build our automated inference and deployment solution. You will help define a scalable architecture for deep learning inference with emphasis on ease-of-use and compute efficiency. Work spans multiple layers of the DL deployment stack: developing features in high-level frameworks (PyTorch, JAX), designing and implementing a high-performance execution environment, low-level GPU optimizations, and developing custom GPU kernels in CUDA and/or Triton.
Responsibilities
- Define and implement a modular, scalable platform to bridge training and deployment workflows and enable tight integration of deployment tooling with training frameworks such as Megatron and NeMo.
- Leverage and extend the Torch 2.0 ecosystem (TorchDynamo, torch.export, torch.compile, etc.) to analyze and extract standardized model graph representations from arbitrary torch models for automated deployment.
- Develop support for inference optimization techniques such as speculative decoding and LoRA.
- Collaborate with teams across NVIDIA to integrate performant kernel implementations into the automated deployment solution.
- Analyze and profile GPU kernel-level performance to identify hardware and software optimization opportunities.
- Continuously innovate on inference performance to ensure NVIDIA's inference software solutions (TensorRT, TRT-LLM, TRT Model Optimizer) maintain and increase market leadership.
Requirements
- Master's, PhD, or equivalent experience in Computer Science, AI, Applied Math, or related field.
- 8+ years of relevant work or research experience in deep learning.
- Excellent software design skills, including debugging, performance analysis, and test design.
- Strong proficiency in Python, PyTorch, and related ML tools.
- Strong algorithms and programming fundamentals.
- Good written and verbal communication skills and the ability to work independently and collaboratively in a fast-paced environment.
Ways to stand out
- Contributions to PyTorch, JAX, or other machine learning frameworks.
- Knowledge of GPU architecture and compilation stack, and capability of understanding and debugging end-to-end performance.
- Familiarity with NVIDIA's deep learning SDKs such as TensorRT.
- Prior experience writing high-performance GPU kernels for ML workloads in CUDA, CUTLASS, or Triton.
Compensation & Benefits
- Base salary range: 224,000 USD - 356,500 USD (final base salary determined by location, experience, and pay of employees in similar positions).
- Eligible for equity and company benefits (link provided in original posting).
Additional information
- Location: Santa Clara, California, United States. #LI-Hybrid
- Applications accepted at least until February 3, 2026.
- NVIDIA uses AI tools in its recruiting processes and is an equal opportunity employer.