Used Tools & Technologies
Machine LearningRequired Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Linux @ 4
Python @ 7
Debugging @ 4
LLM @ 3
PyTorch @ 3
CUDA @ 4
Deep Learning @ 8
AI @ 8
Profiling @ 4
Robotics @ 4
TensorRT @ 4
Performance Analysis @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
Our Automotive Platform Team is building the software foundation for scalable, high-performance vehicle computing platforms that power autonomous driving, ADAS, digital cockpit, and centralized vehicle architectures. We are looking for exceptional engineers who thrive on solving deeply complex system-level challenges and shaping the future of automotive computing.
Responsibilities
- Lead architecture and technical strategy for optimizing inference workloads in autonomous driving applications.
- Drive end-to-end performance analysis across DNN models, TensorRT/compiler flows, CUDA kernels, memory behavior, scheduling, runtime services, and automotive platform constraints.
- Develop and guide model optimization techniques such as quantization, pruning, distillation, graph optimization, operator fusion, kernel selection, and layout/memory optimization.
- Collaborate with TensorRT, CUDA, compiler, silicon architecture, perception, planning, DriveOS and safety platform teams.
- Build tools, methodologies, and metrics for profiling, benchmarking, debugging, and validating model and platform performance.
Requirements
- BS, MS, or PhD in Computer Science, Computer Engineering, Electrical Engineering, or a related field (or equivalent experience).
- 12+ years of software engineering experience in systems software, AI/ML infrastructure, deep learning inference, compiler/runtime technology, or platform performance.
- Strong C/C++ and practical Python experience.
- Deep familiarity with TensorRT, TensorRT-LLM, ONNX, PyTorch, CUDA, Triton, or related frameworks.
- Experience optimizing DNN models for latency, throughput, memory footprint, and power.
Ways to stand out
- Hands-on experience with TensorRT internals, CUDA kernels, Triton kernels, or other compiler/runtime technologies.
- Experience deploying optimized DNNs, LLMs, VLMs, or perception models on embedded, edge, robotics, or automotive platforms.
- Background in autonomous driving, ADAS, robotics, real-time systems, safety-aware software, or deterministic low-latency systems.
- Experience with ISO 26262, QNX, Safe RTOS, DriveOS, Linux, hypervisors, or virtualization.
Compensation
- Base salary range: 224,000 USD - 356,500 USD per year.
- You will also be eligible for equity and benefits.
Additional information
- Applications for this job will be accepted at least until June 30, 2026. This posting is for an existing vacancy.
- NVIDIA uses AI tools in its recruiting processes and is an equal opportunity employer committed to fostering an inclusive work environment.