Senior Applied Research Scientist, Multimodal Retrieval

at Nvidia

📍 Santa Clara, United States

USD 224,000-356,500 per year

SENIOR

✅ Remote ✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Python @ 3 GitHub @ 4 TensorFlow @ 3 Communication @ 7 Microservices @ 4 PyTorch @ 3 GPU @ 4

Details

NVIDIA’s Retriever team is building next-generation retrieval pipelines for RAG (retrieval-augmented generation) with a focus on ingesting modalities beyond text. The team develops production-ready models and pipelines (including top research models such as NV-Embed-v2, Vidore V1/V2 and commercially viable versions) and works on scaling those systems for customers.

Responsibilities

Research, develop, and deploy deep learning models and pipelines that extract text and features from images, video, audio and other modalities.
Build vision pipelines for document ingestion, including page layout analysis, object detection, and OCR.
Design and run datasets, metrics, experiments, and validation scripts to create standard methodologies and guidance for customers on model/pipeline selection.
Collaborate with ML Engineers to scale pipelines for production, developing NVIDIA Inference Microservices (NIMs) and blueprints to demonstrate deployment patterns.
Write papers, blog posts, documentation, and training materials to communicate research and practical guidance.
Keep up to date with academic and industry developments in retrieval and multimodal research.

Requirements

Preferred: Master’s, Ph.D., or equivalent experience in retrieval or multimodal research; track record of publications in conferences such as CVPR, ICCV, ECCV, KDD, etc.
10+ years of experience developing multimodal systems across a range of models and platforms. Information retrieval experience is a strong plus.
Strong, hands-on experience developing computer vision models and pipelines, especially for document-focused tasks (layout analysis, table/figure detection, OCR). Competitive results in vision competitions (Kaggle or similar) are a plus.
Deep understanding of retrieval research, with emphasis on multimodal content retrieval and embeddings.
Knowledge of best practices in batching, streaming, and scaling of ingestion pipelines for real-world production applications.
Excellent Python programming skills and strong familiarity with the Python deep learning ecosystem (PyTorch, TensorFlow, MXNet, etc.).
Experience with GPU computing and optimizing models for GPU inference.
Ability to communicate ideas clearly via papers, blogs, kernels, GitHub, and to mentor junior engineers and interns.
Strong communication and interpersonal skills; ability to collaborate in a dynamic, distributed team.

Benefits

Base salary range: 224,000 USD - 356,500 USD (determined based on location, experience, and comparable roles).
Eligibility for equity and NVIDIA benefits.
Flexible location; team is remotely situated with a focus on NA/EU time zones.
Opportunity to work on production-grade retrieval systems, contribute to high-impact research, and publish/communicate results.

Applications for this job will be accepted at least until September 2, 2025.

NVIDIA is an equal opportunity employer and is committed to fostering a diverse work environment.