Senior Applied Research Scientist, Multimodal Retrieval

at Nvidia
USD 224,000-356,500 per year
SENIOR
✅ Remote ✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Python @ 3 GitHub @ 4 TensorFlow @ 3 Communication @ 7 Microservices @ 4 PyTorch @ 3 GPU @ 4

Details

NVIDIA’s Retriever team is building next-generation retrieval pipelines for RAG (retrieval-augmented generation) with a focus on ingesting modalities beyond text. The team develops production-ready models and pipelines (including top research models such as NV-Embed-v2, Vidore V1/V2 and commercially viable versions) and works on scaling those systems for customers.

Responsibilities

  • Research, develop, and deploy deep learning models and pipelines that extract text and features from images, video, audio and other modalities.
  • Build vision pipelines for document ingestion, including page layout analysis, object detection, and OCR.
  • Design and run datasets, metrics, experiments, and validation scripts to create standard methodologies and guidance for customers on model/pipeline selection.
  • Collaborate with ML Engineers to scale pipelines for production, developing NVIDIA Inference Microservices (NIMs) and blueprints to demonstrate deployment patterns.
  • Write papers, blog posts, documentation, and training materials to communicate research and practical guidance.
  • Keep up to date with academic and industry developments in retrieval and multimodal research.

Requirements

  • Preferred: Master’s, Ph.D., or equivalent experience in retrieval or multimodal research; track record of publications in conferences such as CVPR, ICCV, ECCV, KDD, etc.
  • 10+ years of experience developing multimodal systems across a range of models and platforms. Information retrieval experience is a strong plus.
  • Strong, hands-on experience developing computer vision models and pipelines, especially for document-focused tasks (layout analysis, table/figure detection, OCR). Competitive results in vision competitions (Kaggle or similar) are a plus.
  • Deep understanding of retrieval research, with emphasis on multimodal content retrieval and embeddings.
  • Knowledge of best practices in batching, streaming, and scaling of ingestion pipelines for real-world production applications.
  • Excellent Python programming skills and strong familiarity with the Python deep learning ecosystem (PyTorch, TensorFlow, MXNet, etc.).
  • Experience with GPU computing and optimizing models for GPU inference.
  • Ability to communicate ideas clearly via papers, blogs, kernels, GitHub, and to mentor junior engineers and interns.
  • Strong communication and interpersonal skills; ability to collaborate in a dynamic, distributed team.

Benefits

  • Base salary range: 224,000 USD - 356,500 USD (determined based on location, experience, and comparable roles).
  • Eligibility for equity and NVIDIA benefits.
  • Flexible location; team is remotely situated with a focus on NA/EU time zones.
  • Opportunity to work on production-grade retrieval systems, contribute to high-impact research, and publish/communicate results.

Applications for this job will be accepted at least until September 2, 2025.

NVIDIA is an equal opportunity employer and is committed to fostering a diverse work environment.