Senior Research Scientist, Multimodal Foundation Models and Robotics

at Nvidia

📍 Santa Clara, United States

USD 192,000-356,500 per year

SENIOR

✅ On-site

Used Tools & Technologies

LLM

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Python @ 4 Algorithms @ 4 Machine Learning @ 4 TensorFlow @ 4 PyTorch @ 4 CUDA @ 4 AI @ 6 Reinforcement Learning @ 4 Robotics @ 4 JAX @ 4

Details

We are looking for a Senior Research Scientist focused on Multimodal Foundation Models and Robotics to join the Generalist Embodied Agent Research (GEAR) group at NVIDIA. The team builds humanoid robot foundation models and systems with the mission to create general-purpose embodied agents that learn to explore and master complex skills across virtual and physical worlds. You will work with a collaborative research team producing influential work on multimodal foundation models, large-scale robot learning, game AI, and physical simulation (projects include Eureka, VIMA, Voyager, MineDojo, MimicPlay, Prismer, and Project GR00T).

Responsibilities

Design and implement novel AI algorithms and models for general-purpose humanoid robots and embodied agents.
Develop large-scale AI training and inference methods for foundation models.
Optimize and deploy AI models in physical simulation and on robot hardware.
Collaborate with research and engineering teams across NVIDIA to transfer research to products and services.

Requirements

Ph.D. in Computer Science/Engineering, Electrical Engineering, or equivalent research experience.
At least 5 years of relevant work/research experience across one or both of these fields:
- Multimodal Foundation Models
  - Hands-on training experience and publications in one or more topics: LLMs; large vision-language models; video generative models and diffusion algorithms; or action-based transformers.
  - Outstanding engineering skills in rapid prototyping and model training frameworks (PyTorch, JAX, TensorFlow, etc.). Python is required; C++ and CUDA proficiencies are a big plus.
  - Experience working with large-scale machine learning/AI systems and compute infrastructure.
- Robotics
  - Hands-on training experience and publications in robot learning (e.g., reinforcement learning, imitation learning, classical control methods).
  - Strong programming skills in Python and C++, experience with ROS, and machine learning frameworks such as PyTorch.
  - Deep understanding of robot kinematics, dynamics, and sensors.
  - Ability to safely operate robot hardware, lab equipment, and tools.
  - Knowledge of control methods including PID, model predictive control, and whole-body control.
  - Familiarity with physics simulation frameworks such as MuJoCo and Isaac Sim.
  - Robot hardware design and hands-on building experience.

Compensation and Additional Info

Base salary ranges (dependent on location, experience, and level):
- Level 4: 192,000 USD - 304,750 USD
- Level 5: 224,000 USD - 356,500 USD
You will also be eligible for equity and benefits.
Applications for this job will be accepted at least until May 5, 2026.
NVIDIA uses AI tools in its recruiting processes and is an equal opportunity employer committed to diversity.