Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Communication @ 6
Data Analysis @ 2
LLM @ 3
AI @ 3
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
As a Research Scientist focused on Learning & Cognitive Outcomes, you will help build the scientific and evaluation infrastructure needed to understand how AI systems affect learning, cognition, and capability development over time.
You will design rigorous studies, develop scalable evaluation methods, and help answer a central question: do AI systems help people become more capable over time? This requires measuring reasoning, metacognition, autonomy, transfer, durable skills, and other cognitive outcomes rather than only engagement or task completion. The role sits at the intersection of learning science, cognitive science, experimental design, LLM evaluation, and applied product research. You will develop cognitive outcome measures, design and manage RCTs and field studies, build classifiers and graders, guide external research partners, and translate findings into model and product improvements. The initial focus includes young users and education settings, while contributing to a broader agenda across populations.
Responsibilities
- Design, launch, and manage randomized controlled trials (RCTs), field studies, and large-scale behavioral studies.
- Develop and validate evaluation systems for learning and cognitive outcomes, including rubrics, classifiers, graders, benchmarks, behavioral metrics, and model-based evaluators.
- Build measurement pipelines and prototype analyses; inspect model outputs and reason about classifier and grader performance.
- Detect and measure both positive and negative effects of AI use (e.g., improved reasoning, metacognition, transfer, overreliance, shallow fluency, answer-copying, reduced agency).
- Guide and collaborate with external research partners (schools, universities, education systems, research organizations, vendors) on study design, protocols, measurement strategy, implementation fidelity, analysis plans, and interpretation.
- Translate research findings into actionable recommendations for model behavior, product design, evaluation standards, and future research priorities.
- Communicate clearly to technical, scientific, partner, and executive audiences via memos, reports, protocols, presentations, and publications.
- Operate independently in ambiguous, real-world settings and balance scientific rigor with pragmatic tradeoffs.
- Represent the organization credibly in partner-facing research conversations and escalate scientific, operational, ethical, or strategic judgement calls when needed.
Requirements
- Strong grounding in learning science, cognitive science, educational psychology, behavioral science, HCI, or a related empirical field, with a clear understanding of how people acquire, retain, transfer, and apply knowledge and skills.
- Experience designing and executing rigorous empirical research, including RCTs, field experiments, large-scale behavioral studies, or other causal evaluation methods.
- Ability to design studies that measure meaningful cognitive and learning outcomes beyond engagement or short-term performance.
- Experience building and validating evaluation systems for learning and cognitive outcomes (rubrics, classifiers, graders, benchmarks, behavioral metrics, model-based evaluators).
- Technical fluency to work with data directly, prototype analyses, inspect model outputs, and collaborate effectively with data scientists and engineers.
- Understanding of LLM-based evaluation methods, including model-as-judge systems, rubric design, validation, calibration, inter-rater reliability, and precision/recall tradeoffs.
- Experience working with external partners (schools, universities, education systems, research groups, vendors) and managing external RCTs and field studies.
- Strong communication skills for diverse audiences and the ability to translate research into product and model improvements.
Nice to have
- Experience in frontier AI, big tech research, edtech, learning platforms, tutoring systems, assessment, or technically sophisticated product environments.
- Experience building or evaluating LLM-based graders, classifiers, model-as-judge systems, benchmark datasets, automated assessment tools, or behavioral measurement pipelines.
- Familiarity with outcomes such as reasoning quality, transfer, metacognition, self-regulated learning, motivation, autonomy, cognitive offloading, overreliance, help-seeking, feedback use, or durable skill acquisition.
- Experience running multi-site studies or managing external research programmes with schools, universities, governments, ministries, labs, institutional partners, or large-scale vendors.
- Familiarity with psychometrics, measurement validation, causal inference, longitudinal study design, mixed-methods research, or large-scale behavioral data analysis.
- Experience with research involving young users, consent processes, privacy constraints, ethics review, or other responsible research practices in sensitive settings.
- A track record of translating research into product, model, policy, or organizational decisions and publications or public research outputs in relevant fields.
- Experience working cross-functionally with product managers, engineers, data scientists, policy teams, legal teams, or communications teams.
Benefits
- Base salary range listed for the role and equity (see job posting). Total compensation may include performance-related bonus(es) for eligible employees.
- Medical, dental, and vision insurance with employer contributions to Health Savings Accounts.
- Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit).
- 401(k) retirement plan with employer match.
- Paid parental leave (up to 24 weeks birth parents; 20 weeks non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks).
- Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees.
- 13+ paid company holidays and multiple coordinated office closures, plus paid sick or safe time as required by law.
- Mental health and wellness support; employer-paid basic life and disability coverage.
- Annual learning and development stipend; daily meals in offices and meal delivery credits as eligible.
- Relocation support for eligible employees; additional taxable fringe benefits (charitable donation matching, wellness stipends) may be provided.
Other information
- The role is hybrid (London, UK is listed; New York City is listed as an additional location).
- Background checks will be administered in accordance with applicable law. The employer is an equal opportunity employer and provides reasonable accommodations to applicants with disabilities.