Researcher, Alignment Science

at OpenAI
USD 250,000-445,000 per year
MIDDLE
βœ… Remote βœ… Hybrid
βœ… Relocation

Used Tools & Technologies

Machine Learning LLM

Required Skills & Competences

Python @ 3 Debugging @ 6 PyTorch @ 3 AI @ 3 Reinforcement Learning @ 3

Details

About the Team

The Alignment Science team at OpenAI studies the science of intent alignment: how to train models to understand what users are actually asking for, act faithfully on that intent while respecting safety constraints, verify what they did, and report their limitations honestly. The team focuses on scalable methods for ensuring instruction-following, honesty, and robustness as models become more capable. They use a mix of training and evaluation methods, with a focus on reinforcement learning, and emphasize rigorous, quantitative research that can translate into safer model behavior.

About the Role

As a Research Engineer / Research Scientist on the Alignment team, you will design and run experiments that help increasingly capable models follow user intent, remain calibrated about correctness and risk, and honestly surface their own mistakes. You will work on hands-on model training, evaluation design, and research infrastructure, while helping turn promising alignment methods into techniques that can be used in frontier model development.

This role is based in San Francisco, CA. The team uses a hybrid work model of 3 days in the office per week and offers relocation assistance to new employees. They are also open to exceptional remote candidates who can operate independently and collaborate closely with the team.

Responsibilities

  • Design and implement alignment experiments focused on intent following, honesty, calibration, and robustness.
  • Train and evaluate models using reinforcement learning and other empirical ML methods.
  • Develop evaluations for failure modes such as hallucination, instruction-following failures, reward hacking, covert actions, and scheming.
  • Study methods that encourage models to verify their behavior and report shortcomings honestly, including confession-style training objectives.
  • Build monitoring and inference-time interventions that ensure compliant behavior or surface model issues to users or downstream systems.
  • Investigate how alignment methods scale with model capability, compute, data, context length, action length, and adversarial pressure.
  • Integrate successful techniques into model training and deployment workflows.
  • Produce externally publishable research when results advance the broader science of alignment.
  • Collaborate with researchers and engineers across post-training, RL, evaluations, safety, and product-facing teams.

Requirements / Qualifications

  • Strong hands-on experience training, evaluating, or debugging large ML models, especially LLMs.
  • Excellent engineering skills in Python and modern ML frameworks such as PyTorch.
  • Mathematical rigor and quantitative taste; ability to turn ambiguous research questions into measurable experiments.
  • Experience with reinforcement learning, post-training, preference optimization, scalable oversight, model evaluation, or adjacent empirical ML research.
  • Ability to operate with high independence and collaborate in fast-paced research environments.
  • Strong record in technical problem solving (for example, competitive programming, math contests, systems work, or similar projects).
  • Commitment to building AI systems that are trustworthy, honest, and reliable in high-stakes settings.

Benefits & Additional Notes

  • Base pay range listed: $250K - $445K; offers equity and additional compensation components.
  • Medical, dental, and vision insurance with employer contributions to HSAs; pre-tax accounts for Health FSA and Dependent Care FSA; 401(k) with employer match.
  • Paid parental, medical, and caregiver leave; flexible PTO; paid company holidays and office closures.
  • Mental health and wellness support; employer-paid basic life and disability coverage.
  • Annual learning & development stipend; daily meals in offices and meal delivery credits as eligible.
  • Relocation support for eligible employees; background checks administered in accordance with applicable law.
  • OpenAI is an equal opportunity employer and provides reasonable accommodations to applicants with disabilities.