Senior Software Engineer, Deep Learning - Torch-TRT

at Nvidia

📍 Santa Clara, United States

USD 184,000-356,500 per year

SENIOR

✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Software Development @ 6 Docker @ 4 Python @ 7 Machine Learning @ 4 PyTorch @ 4 CUDA @ 4 GPU @ 4

Details

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. Today NVIDIA is focusing on AI to define the next era of computing where GPUs power computers, robots, and self-driving cars. The Solution Engineering - Automotive Machine Learning team develops technologies to deploy capable deep learning models in physical AI systems, including compiler technology to optimize large models for NVIDIA hardware and working closely with partners during product development.

Responsibilities

Develop compiler technologies to run various classes of model architectures (Transformer, Diffusion, VLA, CNN, RNN, etc.) on NVIDIA hardware leveraging techniques such as reduced precision, quantization, workload scheduling, and memory bandwidth optimization.
Work across the whole lifetime of a model: training, fine-tuning, and optimization so customers can access cutting-edge models on NVIDIA hardware.
Develop workflows that let users leverage frameworks (e.g. PyTorch, JAX) and ecosystem tools (HuggingFace, MLIR) without sacrificing performance.
Stay up to date with the latest research and innovations in deep learning; implement and experiment with new insights to improve NVIDIA's Physical AI DNNs.
Coordinate with architecture and software teams to develop solutions for partners working on NVIDIA platforms.

Requirements

MS or PhD in computer science, computer vision, robotics, computer architecture, or equivalent experience in a technical field.
5+ years of professional software development experience.
2+ years of experience implementing deep learning models and optimizations (e.g. graph fusions, kernel implementation, KV caching).
Domain experience with modern deep learning methods (e.g. diffusion models, vision-language-action models).
Strong Python and/or C/C++ programming skills.
Proven technical foundation in CPU and GPU architectures, containers (nvidia-docker), numeric libraries, and modular software design.
Strong analytical skills and willingness to take action.
Strong time-management and organizational skills for coordinating multiple initiatives and implementing new technology into complex projects.

Ways to stand out

Background with low-precision inference, quantization, and compression of DNNs.
Experience optimizing GPU workloads or developing kernels for common deep learning operators.
Experience with NVIDIA software libraries such as CUDA and TensorRT.
In-depth experience with internals of deep learning frameworks such as PyTorch or JAX (custom operators, graph fusions, model deployment).
Experience using current-generation kernel authoring DSLs such as Triton or cuTile (or similar).

Compensation

Base salary range (Level 4): 184,000 USD - 287,500 USD.
Base salary range (Level 5): 224,000 USD - 356,500 USD.
Eligible for equity and benefits. See: https://www.nvidia.com/en-us/benefits/

Additional details

Applications for this job will be accepted at least until December 16, 2025.
NVIDIA is an equal opportunity employer committed to fostering a diverse work environment.