Used Tools & Technologies
Machine LearningRequired Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Linux @ 4
Parallel Programming @ 4
Performance Optimization @ 4
CUDA @ 4
GPU @ 4
Deep Learning @ 4
AI @ 4
Robotics @ 4
TensorRT @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
NVIDIA is a global leader in physical AI, powering self-driving cars, humanoid robots, intelligent environments, and medical devices. The team builds software platforms to optimize deep learning inference for autonomous vehicles and robotics on edge devices. This role is a hands-on technical specialist focused on operator/kernel-level analysis, kernel trace analysis, and end-to-end inference performance on GPU and SoC platforms.
Responsibilities
- Address customer and partner optimization challenges by engaging with automotive OEMs and robotics partners to analyze, debug, and improve deep learning models on NVIDIA platforms; deliver solutions, not only recommendations.
- Own performance benchmarking efforts (MLPerf Edge and other industry benchmarks) including defining methodology, ensuring reproducibility, and turning results into actionable optimization priorities.
- Evaluate emerging model architectures (transformers, vision-language models, diffusion/flow matching, state space models, vision encoders, multi-camera tokenizers) for compilation feasibility, memory footprint, and latency on target SoCs.
- Collaborate across compiler, runtime, and hardware teams to connect model-level insights with platform capabilities.
- Contribute to build reviews and help develop internal roadmap priorities based on real customer workload patterns.
- Represent NVIDIA externally at conferences, webinars, and partner events to share deep learning optimization expertise and establish guidelines.
- Deliver TensorRT and compiler-stack solutions for edge: build and deploy inference solutions on Jetson, DRIVE, and GPU+ARM platforms for AV and robotics workloads; develop Proofs of Readiness (PORs) and collaborate on Torch-TRT, MLIR-TRT, and related frameworks.
Requirements
- Master's degree or equivalent experience in Computer Science, Electrical Engineering, or related field.
- Over 12 years in the industry, including at least 8 years specializing in deep learning model optimization, inference engineering, or neural network compilation; proficiency at operator/kernel level is required.
- Over 5 years of validated expertise in embedded/edge software delivering production inference solutions in power-limited, latency-sensitive environments.
- Comprehensive knowledge of contemporary DL architectures: transformers, attention variants, vision encoders (ViT), multi-modal/vision-language frameworks, diffusion models, and/or state space models.
- Expert knowledge of GPU architecture fundamentals, CUDA, and low-level performance optimization using heterogeneous computing; experience with TensorRT and compiler IRs or equivalent inference optimization toolchains.
- Solid understanding of embedded OS internals (QNX/Linux), memory management, C/C++, and embedded/system software concepts.
- Background in parallel programming (e.g., CUDA, OpenMP) and reasoning about memory hierarchies, data movement, and compute utilization.
- Demonstrated ability to collaborate directly with external partners and customers in a deep technical role to solve workload and performance issues within production constraints.
Ways to Stand Out
- Experience with ML compiler frameworks (TVM, MLIR, XLA, Triton) or contributing to inference runtime development.
- Production deployment experience with autonomous vehicle perception or planning stacks, understanding the full pipeline from sensor input through trajectory output.
- Familiarity with the Physical AI model landscape: VLM + action expert architectures, end-to-end driving models, or robot foundation models.
- Contributions to MLPerf benchmarks and large-scale industry performance optimization efforts.
- Experience with automotive safety standards (ISO 26262, SOTIF) and their implications for inference system development.
Compensation & Other Details
- Base salary range: 224,000 USD - 356,500 USD (determined based on location, experience, and comparable pay).
- Eligible for equity and company benefits.
- Applications accepted at least until March 15, 2026. This posting is for an existing vacancy.
- NVIDIA uses AI tools in its recruiting processes and is an equal opportunity employer committed to diversity.