Senior Perception Engineer, Obstacle Foundation Models - Autonomous Vehicles

at Nvidia

📍 Santa Clara, United States

USD 184,000-356,500 per year

SENIOR

✅ On-site

Used Tools & Technologies

Not specified

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Python @ 7 Algorithms @ 7 Hiring @ 4 Communication @ 4 PyTorch @ 7 CUDA @ 4 GPU @ 4 Deep Learning @ 4 AI @ 4 Computer Vision @ 7 Robotics @ 4

Details

Intelligent machines powered by artificial intelligence—computers that can learn, reason, and interact with people—are transforming every industry. GPU-accelerated deep learning provides the foundation for machines to perceive, reason, and solve complex problems. NVIDIA GPUs run deep learning algorithms that simulate aspects of human intelligence, acting as the brain of computers, robots, and self-driving cars that can perceive and understand the world.

We are seeking an exceptional Senior Perception Engineer to help design and productize NVIDIA’s next-generation autonomous driving perception stack. You will work on the core 3D obstacle perception pipeline, contribute to architecture and algorithm design, and remain deeply hands-on with implementation, including modern transformer-based, multi-modal, and vision-language techniques where they add real value.

Responsibilities

Develop and improve the technical design, architecture, and roadmap for 3D obstacle perception to support end-to-end autonomous driving functionalities, leveraging state-of-the-art CNN and transformer-based architectures where appropriate.
Design and implement advanced 3D perception models using multi-camera inputs and/or multi-sensor fusion (camera, radar, lidar) for obstacle detection and tracking, including opportunities to explore BEV and transformer-based 3D perception.
Build efficient, production-grade deep learning models: define objectives with the team, select and prototype architectures, run experiments, and follow best practices for training and evaluation, using techniques such as large-scale pretraining, distillation, and parameter-efficient fine-tuning (e.g., LoRA).
Help define and maintain KPI frameworks to quantify perception performance; analyze large-scale real and synthetic datasets to identify failure modes and systematically improve accuracy, robustness, and efficiency, incorporating approaches like self-supervised and representation learning when beneficial.
Contribute to the data strategy for perception: specify data and labeling requirements, help prioritize data collection and annotation, and collaborate with data and ground-truth teams, including model-assisted workflows (e.g., active learning, auto-labeling, vision-language models (VLMs)) and model-in-the-loop tooling.
Collaborate with safety, systems, and software teams to ensure perception solutions meet product requirements for safety, latency, resource usage, and software robustness, and are ready for deployment at scale.

Requirements

PhD with 4+ years, MS with 6+ years, or BS (or equivalent experience) with 8+ years of relevant experience in Computer Science, Computer Engineering, or a related technical field.
Hands-on experience developing deep learning–based perception or closely related systems for complex real-world problems, with strong proficiency in frameworks such as PyTorch and a track record of taking models from prototype to production.
Proven experience in data-driven development, including close collaboration with data, labeling, and ground-truth teams on data strategy, labeling quality, and iterative model improvement.
Strong programming skills in Python and/or C++, with experience building reliable, high-performance, production-quality software.
Excellent communication and collaboration skills, with the ability to work effectively across multidisciplinary teams.

Preferred / Ways to stand out

Experience designing and deploying perception solutions for autonomous driving or robotics using camera-based deep learning at scale.
Hands-on experience architecting and deploying DNN-based perception pipelines on embedded or real-time platforms, including optimization for latency, memory, and compute constraints.
Familiarity with modern architectures such as CNNs and transformers, and techniques like large-scale pretraining, parameter-efficient fine-tuning (e.g., LoRA), or vision-language models (VLMs).
Strong publication record or recognized contributions in deep learning, computer vision, or autonomous systems at leading conferences/journals (e.g., CVPR, ICCV, NeurIPS, IROS).
Deep understanding of 3D computer vision fundamentals, including camera modeling and calibration (intrinsic and extrinsic), multi-view geometry, and 3D representations, ideally with experience applying these concepts in transformer-based 3D or BEV perception pipelines.
Experience with CUDA development and optimizing training or inference pipelines through custom CUDA kernels or other GPU-accelerated components.

Compensation and benefits

Base salary range (explicit): 184000 USD - 287500 USD for Level 4; 224000 USD - 356500 USD for Level 5.
You will also be eligible for equity and benefits (link provided in the original posting).

Additional information

Applications for this job will be accepted at least until March 24, 2026.
This posting is for an existing vacancy. NVIDIA uses AI tools in its recruiting processes.
NVIDIA is an equal opportunity employer and emphasizes diversity and non-discrimination in hiring and promotion practices.