Researcher, Alignment Science

at OpenAI

📍 San Francisco, United States

USD 250,000-445,000 per year

MIDDLE

✅ Remote ✅ Hybrid

✅ Relocation

Used Tools & Technologies

Machine Learning LLM

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Python @ 3 Debugging @ 6 PyTorch @ 3 AI @ 3 Reinforcement Learning @ 3

Details

About the Team

The Alignment Science team at OpenAI studies the science of intent alignment: how to train models to understand what users are actually asking for, act faithfully on that intent while respecting safety constraints, verify what they did, and report their limitations honestly. The team focuses on scalable methods for ensuring instruction-following, honesty, and robustness as models become more capable. They use a mix of training and evaluation methods, with a focus on reinforcement learning, and emphasize rigorous, quantitative research that can translate into safer model behavior.

About the Role

As a Research Engineer / Research Scientist on the Alignment team, you will design and run experiments that help increasingly capable models follow user intent, remain calibrated about correctness and risk, and honestly surface their own mistakes. You will work on hands-on model training, evaluation design, and research infrastructure, while helping turn promising alignment methods into techniques that can be used in frontier model development.

This role is based in San Francisco, CA. The team uses a hybrid work model of 3 days in the office per week and offers relocation assistance to new employees. They are also open to exceptional remote candidates who can operate independently and collaborate closely with the team.

Responsibilities

Design and implement alignment experiments focused on intent following, honesty, calibration, and robustness.
Train and evaluate models using reinforcement learning and other empirical ML methods.
Develop evaluations for failure modes such as hallucination, instruction-following failures, reward hacking, covert actions, and scheming.
Study methods that encourage models to verify their behavior and report shortcomings honestly, including confession-style training objectives.
Build monitoring and inference-time interventions that ensure compliant behavior or surface model issues to users or downstream systems.
Investigate how alignment methods scale with model capability, compute, data, context length, action length, and adversarial pressure.
Integrate successful techniques into model training and deployment workflows.
Produce externally publishable research when results advance the broader science of alignment.
Collaborate with researchers and engineers across post-training, RL, evaluations, safety, and product-facing teams.

Requirements / Qualifications

Strong hands-on experience training, evaluating, or debugging large ML models, especially LLMs.
Excellent engineering skills in Python and modern ML frameworks such as PyTorch.
Mathematical rigor and quantitative taste; ability to turn ambiguous research questions into measurable experiments.
Experience with reinforcement learning, post-training, preference optimization, scalable oversight, model evaluation, or adjacent empirical ML research.
Ability to operate with high independence and collaborate in fast-paced research environments.
Strong record in technical problem solving (for example, competitive programming, math contests, systems work, or similar projects).
Commitment to building AI systems that are trustworthy, honest, and reliable in high-stakes settings.

Benefits & Additional Notes

Base pay range listed: $250K - $445K; offers equity and additional compensation components.
Medical, dental, and vision insurance with employer contributions to HSAs; pre-tax accounts for Health FSA and Dependent Care FSA; 401(k) with employer match.
Paid parental, medical, and caregiver leave; flexible PTO; paid company holidays and office closures.
Mental health and wellness support; employer-paid basic life and disability coverage.
Annual learning & development stipend; daily meals in offices and meal delivery credits as eligible.
Relocation support for eligible employees; background checks administered in accordance with applicable law.
OpenAI is an equal opportunity employer and provides reasonable accommodations to applicants with disabilities.