Senior Research Scientist, Multi-Modal Language Models

at Nvidia

📍 Santa Clara, United States

USD 192,000-356,500 per year

SENIOR

✅ On-site

Used Tools & Technologies

Not specified

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Python @ 6 Algorithms @ 4 Data Structures @ 4 Distributed Systems @ 4 LLM @ 4 PyTorch @ 6 Deep Learning @ 4 AI @ 4 Computer Vision @ 6

Details

We are seeking a Senior Research Scientist focused on multi-modal language models to drive Nemotron multi-modal technology and deliver state-of-the-art open-source multi-modal models. The team emphasizes open models, open weights, and open data, aiming for models that work well in real-world settings and uplift the multi-modal LLM ecosystem.

Responsibilities

Drive new abilities into multi-modal models.
Improve generalization of existing functionalities by identifying weak points, designing data synthesis solutions, and retraining models.
Develop recipes for training models that mix multiple modalities (text, image, video, audio, etc.).
Design solutions that improve Pareto efficiency.
Collaborate with researchers to translate cutting-edge ideas into production-ready implementations.
Explore new paradigms for evaluation.
Demonstrate strong engineering practices and contribute to open-source communities.

Requirements

PhD in Computer Science, Electrical Engineering, or related field, or equivalent research experience in LLMs, systems, or related areas.
4+ years of experience in computer vision, especially multi-modal LLMs.
Proficiency in Python with hands-on experience in frameworks such as PyTorch.
Solid background in computer science fundamentals: algorithms, data structures, parallel/distributed computing, and systems programming.
Proven ability to collaborate across research and engineering teams in multifaceted environments.

Ways to stand out

Specific multi-modal LLM research experience.
Experience developing and scaling large distributed systems for deep learning.
Contributions to open-source LLM systems or large-scale AI infrastructure.

Compensation & Benefits

Base salary ranges (dependent on location/level/experience):
- Level 4: 192,000 USD - 304,750 USD
- Level 5: 224,000 USD - 356,500 USD
Eligible for equity and benefits (link to company benefits referenced in the posting).

Additional information

Location: Santa Clara, CA, United States.
Employment type: Full time.
Applications accepted at least until February 8, 2026.
NVIDIA uses AI tools in its recruiting processes.
NVIDIA is an equal opportunity employer and values diversity.