Senior Vision Language Model Engineer

at Nvidia
USD 184,000-356,500 per year
SENIOR
✅ On-site

Used Tools & Technologies

Not specified

Required Skills & Competences

Python @ 4 Algorithms @ 4 Communication @ 4 Debugging @ 4 Deep Learning @ 7 AI @ 4 Robotics @ 4 Agentic AI @ 4

Details

NVIDIA is the platform upon which every new AI-powered application is built. We are seeking a senior vision language model engineer to design and build agentic data and training workflows for Autonomous Vehicles, Robotics, and Medical applications. The right person for this role brings technical innovation and collaborative culture to change the way NVIDIA builds dataset search platforms for physical AI developers. Our dataset search offerings are easy to use, performant and scalable. Your work will redefine the dataset search and model training capabilities in NVIDIA product offerings and impact the most iconic companies in Physical AI.

Responsibilities

  • Partner with researchers to develop and evaluate prototypes of latest models (such as VLMs and VLAs) for video search, video understanding, and related tasks to enable advances in autonomous driving, healthcare, and robotics.
  • Design and implement agentic data workflows that automate data discovery, labeling, evaluation, and retraining to maximize development velocity.
  • Build, curate, and maintain high-quality multimodal datasets (video, sensor, language/action traces) tailored for end-to-end physical AI problems (e.g., autonomous driving).
  • Explore and productize new data sources including simulation and synthetic data.
  • Use agentic AI workflows across the full applied research lifecycle: prototyping algorithms and search pipelines, benchmarking, and integrating prototypes into production codebases.
  • Collaborate with research, model development, performance, and product teams.
  • Contribute to NVIDIA Cosmos Dataset Search and other core NVIDIA platforms and products.

Requirements

  • PhD with 4+ years, MS with 6+ years, or BS (or equivalent experience) with 8+ years of relevant experience in Computer Science, Computer Engineering, or a related technical field.
  • Strong background in modern deep learning, including transformer-based architectures, video modeling, and multimodal VLM/VLA or foundation models.
  • Excellent experience training and deploying deep learning models on real-world datasets: data preprocessing, distributed training, evaluation, debugging, and iterative improvement.
  • Excellent experience with Python and at least one deep learning framework.
  • Current with latest research on image and video search in autonomous vehicles, healthcare, robotics, or related physical AI applications.
  • Fluent with agentic AI workflows across the full applied research lifecycle, including prototyping novel algorithms and search pipelines, benchmarking, and integrating prototypes in production codebases.
  • Clear and effective communication skills; experience working in dynamic, product- and research-focused teams.

Ways to Stand Out

  • Strong track record publishing in top-tier conferences such as CVPR, NeurIPS, ICML, ECCV.
  • Patents in video retrieval or related fields.
  • Strong coding architecture skills demonstrated through contributions to large internal or open-source projects.
  • Experience in robotic systems such as autonomous vehicles or humanoid robotics.

Compensation and Benefits

  • Base salary ranges by level: Level 4: 184,000 USD - 287,500 USD; Level 5: 224,000 USD - 356,500 USD.
  • You will also be eligible for equity and benefits.

Additional Information

  • Applications accepted at least until May 17, 2026. This posting is for an existing vacancy.
  • NVIDIA uses AI tools in its recruiting processes.
  • NVIDIA is an equal opportunity employer and committed to fostering a diverse work environment.