Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Python @ 6
Algorithms @ 4
Data Structures @ 4
Distributed Systems @ 4
LLM @ 4
PyTorch @ 6
Deep Learning @ 4
AI @ 4
Computer Vision @ 6
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
We are seeking a Senior Research Scientist focused on multi-modal language models to drive Nemotron multi-modal technology and deliver state-of-the-art open-source multi-modal models. The team emphasizes open models, open weights, and open data, aiming for models that work well in real-world settings and uplift the multi-modal LLM ecosystem.
Responsibilities
- Drive new abilities into multi-modal models.
- Improve generalization of existing functionalities by identifying weak points, designing data synthesis solutions, and retraining models.
- Develop recipes for training models that mix multiple modalities (text, image, video, audio, etc.).
- Design solutions that improve Pareto efficiency.
- Collaborate with researchers to translate cutting-edge ideas into production-ready implementations.
- Explore new paradigms for evaluation.
- Demonstrate strong engineering practices and contribute to open-source communities.
Requirements
- PhD in Computer Science, Electrical Engineering, or related field, or equivalent research experience in LLMs, systems, or related areas.
- 4+ years of experience in computer vision, especially multi-modal LLMs.
- Proficiency in Python with hands-on experience in frameworks such as PyTorch.
- Solid background in computer science fundamentals: algorithms, data structures, parallel/distributed computing, and systems programming.
- Proven ability to collaborate across research and engineering teams in multifaceted environments.
Ways to stand out
- Specific multi-modal LLM research experience.
- Experience developing and scaling large distributed systems for deep learning.
- Contributions to open-source LLM systems or large-scale AI infrastructure.
Compensation & Benefits
- Base salary ranges (dependent on location/level/experience):
- Level 4: 192,000 USD - 304,750 USD
- Level 5: 224,000 USD - 356,500 USD
- Eligible for equity and benefits (link to company benefits referenced in the posting).
Additional information
- Location: Santa Clara, CA, United States.
- Employment type: Full time.
- Applications accepted at least until February 8, 2026.
- NVIDIA uses AI tools in its recruiting processes.
- NVIDIA is an equal opportunity employer and values diversity.