Senior Applied Research Scientist, Multimodal Retrieval
at Nvidia
USD 224,000-356,500 per year
SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
Python @ 3 GitHub @ 4 TensorFlow @ 3 Communication @ 7 Microservices @ 4 PyTorch @ 3 GPU @ 4Details
NVIDIA’s Retriever team is building next-generation retrieval pipelines for RAG (retrieval-augmented generation) with a focus on ingesting modalities beyond text. The team develops production-ready models and pipelines (including top research models such as NV-Embed-v2, Vidore V1/V2 and commercially viable versions) and works on scaling those systems for customers.
Responsibilities
- Research, develop, and deploy deep learning models and pipelines that extract text and features from images, video, audio and other modalities.
- Build vision pipelines for document ingestion, including page layout analysis, object detection, and OCR.
- Design and run datasets, metrics, experiments, and validation scripts to create standard methodologies and guidance for customers on model/pipeline selection.
- Collaborate with ML Engineers to scale pipelines for production, developing NVIDIA Inference Microservices (NIMs) and blueprints to demonstrate deployment patterns.
- Write papers, blog posts, documentation, and training materials to communicate research and practical guidance.
- Keep up to date with academic and industry developments in retrieval and multimodal research.
Requirements
- Preferred: Master’s, Ph.D., or equivalent experience in retrieval or multimodal research; track record of publications in conferences such as CVPR, ICCV, ECCV, KDD, etc.
- 10+ years of experience developing multimodal systems across a range of models and platforms. Information retrieval experience is a strong plus.
- Strong, hands-on experience developing computer vision models and pipelines, especially for document-focused tasks (layout analysis, table/figure detection, OCR). Competitive results in vision competitions (Kaggle or similar) are a plus.
- Deep understanding of retrieval research, with emphasis on multimodal content retrieval and embeddings.
- Knowledge of best practices in batching, streaming, and scaling of ingestion pipelines for real-world production applications.
- Excellent Python programming skills and strong familiarity with the Python deep learning ecosystem (PyTorch, TensorFlow, MXNet, etc.).
- Experience with GPU computing and optimizing models for GPU inference.
- Ability to communicate ideas clearly via papers, blogs, kernels, GitHub, and to mentor junior engineers and interns.
- Strong communication and interpersonal skills; ability to collaborate in a dynamic, distributed team.
Benefits
- Base salary range: 224,000 USD - 356,500 USD (determined based on location, experience, and comparable roles).
- Eligibility for equity and NVIDIA benefits.
- Flexible location; team is remotely situated with a focus on NA/EU time zones.
- Opportunity to work on production-grade retrieval systems, contribute to high-impact research, and publish/communicate results.
Applications for this job will be accepted at least until September 2, 2025.
NVIDIA is an equal opportunity employer and is committed to fostering a diverse work environment.