PhD Data Generation and User Simulation Research Intern — Fall 2026

at Nvidia

📍 World
📍 Canada
📍 United States

USD 30-94 per hour

INTERN

✅ Remote

Used Tools & Technologies

Not specified

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Python @ 3 Machine Learning @ 3 NLP @ 3 LLM @ 3 PyTorch @ 3 Deep Learning @ 3 AI @ 3 vLLM @ 3

Details

Today, NVIDIA is tapping into the unlimited potential of AI to define the next era of computing. The role sits on a research team focused on artificial data creation across pre-training, post-training, and evaluation infrastructure. Workstreams include population-grounded user simulation (synthetic users interacting with LLMs, calibrated against real behavioral signatures), verifier-grounded trajectory synthesis, multilingual and low-resource coverage, and SDG quality measurement across pre- and post-training corpora. The team measures success by downstream model performance (accuracy, robustness, calibration, multilingual parity, agentic safety) rather than by surface plausibility.

Responsibilities

Research innovative techniques in generative models, artificial data creation, user simulation, reward modeling, and data-quality estimation for LLM training.
Design and apply methods for high-fidelity synthetic data (e.g., behavioral calibration of simulated users, procedurally generated probes and scenario coverage, trajectory generation guided by verification, process-reward extraction from multi-step interactions, population-aware data mixing for pre- and post-training).
Conduct experiments to validate that synthetic data measurably improves downstream model performance (accuracy, robustness, calibration, multilingual parity, agentic safety).
Collaborate with researchers and engineers to integrate methods into production training and evaluation pipelines.
Prepare research findings for internal presentations and potential publication at top-tier AI conferences.

Requirements

Currently pursuing a PhD in Computer Science, Machine Learning, Computational Linguistics, Computational Neuroscience, or an equivalent program, with a specialization in deep learning, NLP, or LLM training.
Research experience in at least one of: generative modeling, synthetic data generation, LLM post-training (SFT/RLHF/DPO/RL), reward modeling, multi-agent or interactive simulation, behavioral or cognitive modeling, or large-scale data curation.
Excellent Python programming skills.
Hands-on experience with deep learning frameworks (PyTorch) and the modern LLM training/serving stack (e.g., HuggingFace, vLLM, distributed training).
Strong research background with publications at top-tier AI, ML, or NLP conferences.

Ways to stand out

Experience training or fine-tuning LLMs end-to-end and evaluating them against real downstream tasks.
Prior work on LLM-as-judge calibration, inter-rater agreement, or evaluator robustness for subjective dimensions.
Prior work on user simulation, agent–user interaction modeling, or behavioral modeling grounded in real population data or cognitive science.
Interest or background in multilingual / low-resource / sovereign-AI evaluation and training.
Contributions to open-source projects in the SDG, LLM training, or evaluation space.

Compensation and benefits

Internship hourly rate: 30 USD - 94 USD.
Eligible for NVIDIA intern benefits (link provided in original posting).

Application and other details

Applications accepted at least until May 26, 2026.
This posting is for an existing vacancy.
NVIDIA uses AI tools in its recruiting processes.
NVIDIA is an equal opportunity employer committed to diversity and non-discrimination.