Senior Systems Software Engineer - Deep Learning Solutions

at Nvidia

📍 Santa Clara, United States

USD 224,000-356,500 per year

SENIOR

✅ On-site

Used Tools & Technologies

Machine Learning

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Linux @ 4 Parallel Programming @ 4 Performance Optimization @ 4 CUDA @ 4 GPU @ 4 Deep Learning @ 4 AI @ 4 Robotics @ 4 TensorRT @ 4

Details

NVIDIA is a global leader in physical AI, powering self-driving cars, humanoid robots, intelligent environments, and medical devices. The team builds software platforms to optimize deep learning inference for autonomous vehicles and robotics on edge devices. This role is a hands-on technical specialist focused on operator/kernel-level analysis, kernel trace analysis, and end-to-end inference performance on GPU and SoC platforms.

Responsibilities

Address customer and partner optimization challenges by engaging with automotive OEMs and robotics partners to analyze, debug, and improve deep learning models on NVIDIA platforms; deliver solutions, not only recommendations.
Own performance benchmarking efforts (MLPerf Edge and other industry benchmarks) including defining methodology, ensuring reproducibility, and turning results into actionable optimization priorities.
Evaluate emerging model architectures (transformers, vision-language models, diffusion/flow matching, state space models, vision encoders, multi-camera tokenizers) for compilation feasibility, memory footprint, and latency on target SoCs.
Collaborate across compiler, runtime, and hardware teams to connect model-level insights with platform capabilities.
Contribute to build reviews and help develop internal roadmap priorities based on real customer workload patterns.
Represent NVIDIA externally at conferences, webinars, and partner events to share deep learning optimization expertise and establish guidelines.
Deliver TensorRT and compiler-stack solutions for edge: build and deploy inference solutions on Jetson, DRIVE, and GPU+ARM platforms for AV and robotics workloads; develop Proofs of Readiness (PORs) and collaborate on Torch-TRT, MLIR-TRT, and related frameworks.

Requirements

Master's degree or equivalent experience in Computer Science, Electrical Engineering, or related field.
Over 12 years in the industry, including at least 8 years specializing in deep learning model optimization, inference engineering, or neural network compilation; proficiency at operator/kernel level is required.
Over 5 years of validated expertise in embedded/edge software delivering production inference solutions in power-limited, latency-sensitive environments.
Comprehensive knowledge of contemporary DL architectures: transformers, attention variants, vision encoders (ViT), multi-modal/vision-language frameworks, diffusion models, and/or state space models.
Expert knowledge of GPU architecture fundamentals, CUDA, and low-level performance optimization using heterogeneous computing; experience with TensorRT and compiler IRs or equivalent inference optimization toolchains.
Solid understanding of embedded OS internals (QNX/Linux), memory management, C/C++, and embedded/system software concepts.
Background in parallel programming (e.g., CUDA, OpenMP) and reasoning about memory hierarchies, data movement, and compute utilization.
Demonstrated ability to collaborate directly with external partners and customers in a deep technical role to solve workload and performance issues within production constraints.

Ways to Stand Out

Experience with ML compiler frameworks (TVM, MLIR, XLA, Triton) or contributing to inference runtime development.
Production deployment experience with autonomous vehicle perception or planning stacks, understanding the full pipeline from sensor input through trajectory output.
Familiarity with the Physical AI model landscape: VLM + action expert architectures, end-to-end driving models, or robot foundation models.
Contributions to MLPerf benchmarks and large-scale industry performance optimization efforts.
Experience with automotive safety standards (ISO 26262, SOTIF) and their implications for inference system development.

Compensation & Other Details

Base salary range: 224,000 USD - 356,500 USD (determined based on location, experience, and comparable pay).
Eligible for equity and company benefits.
Applications accepted at least until March 15, 2026. This posting is for an existing vacancy.
NVIDIA uses AI tools in its recruiting processes and is an equal opportunity employer committed to diversity.