Used Tools & Technologies
Machine LearningRequired Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 ā basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 ā daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 ā you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 ā exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Linux @ 4
Parallel Programming @ 4
Performance Optimization @ 4
CUDA @ 4
GPU @ 4
Deep Learning @ 4
AI @ 4
Profiling @ 4
Robotics @ 4
TensorRT @ 4
- 1-2 ā basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 ā daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 ā you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 ā exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
NVIDIA is a global leader in physical AI powering self-driving cars, humanoid robots, intelligent environments, and medical devices. This role is a hands-on senior engineering position focused on deep learning inference optimization for autonomous vehicles and robotics on edge hardware. The work spans model-level analysis, kernel and runtime optimization, compiler interactions, and deployment on resource-constrained SOCs and GPUs.
Responsibilities
- Engage directly with automotive OEMs, robotics partners, and internal hardware teams to analyze, debug, and optimize deep learning models on NVIDIA platforms, delivering production-ready solutions.
- Own performance benchmarking efforts (MLPerf Edge and industry benchmarks), define methodology, ensure reproducibility, and drive actionable optimization priorities.
- Inspect model architectures at the operator/kernel level and uncover performance bottlenecks through kernel traces and profiling.
- Evaluate emerging model architectures (vision encoders, multi-modal VLMs, hybrid SSM-Transformer backbones, diffusion/flow matching decoders, multi-camera tokenizers) for compilation feasibility, memory footprint, and latency on target SOCs.
- Collaborate with compiler, runtime, and hardware teams to bridge model-level insights with platform capabilities.
- Contribute to build reviews and roadmap priorities informed by customer workload patterns.
- Represent NVIDIA externally at conferences, webinars, and partner events; share optimization expertise and contribute guidelines and best practices.
- Develop and deploy TensorRT and compiler-stack inference solutions for edge platforms (Jetson, DRIVE, GPU + ARM), create Proofs of Readiness (PORs), and work with compiler teams on Torch-TRT, MLIR-TRT, and related frameworks.
Requirements
- Master's degree or equivalent experience in Computer Science, Electrical Engineering, or a related field.
- 12+ years of industry experience with over 8 years in deep learning model optimization, inference engineering, or neural network compilation.
- 5+ years of validated experience in embedded/edge software delivering production inference solutions in power- and latency-constrained environments.
- Deep knowledge of modern DL architectures: transformers, attention variants, vision encoders (ViT), multi-modal/vision-language frameworks, and experience with diffusion models and/or state space models.
- Expert knowledge of GPU architecture fundamentals, CUDA, and low-level performance optimization using heterogeneous computing.
- Experience with TensorRT, compiler IRs, or equivalent inference optimization toolchains.
- Solid understanding of embedded operating system internals (QNX/Linux), memory management, C/C++, and embedded/system software concepts.
- Background in parallel programming (e.g., CUDA, OpenMP) and experience reasoning about memory hierarchies, data movement, and compute utilization.
- Demonstrated ability to collaborate directly with external partners and customers to solve workload and performance problems within production constraints.
Preferred / Ways to Stand Out
- Experience with ML compiler frameworks (TVM, MLIR, XLA, Triton) or contributions to inference runtime development.
- Production deployment experience with autonomous vehicle perception or planning stacks and understanding of the full pipeline from sensors to trajectory output.
- Familiarity with Physical AI model families (VLM + action expert architectures, end-to-end driving models, robot foundation models).
- Contributions to MLPerf benchmarks and large-scale industry performance optimization efforts.
- Experience with automotive safety standards (ISO 26262, SOTIF) and their implications for inference systems.
- Experience leading technical initiatives across globally distributed teams.
Compensation & Benefits
- Base salary range: 225,000 CAD - 275,000 CAD (determined by location, experience, and pay of employees in similar positions).
- Eligible for equity and company benefits (link referenced in original posting).
Other
- Applications accepted at least until March 2, 2026.
- This posting is for an existing vacancy.
- NVIDIA uses AI tools in its recruiting processes.