Research Scientist/Engineer, Alignment Finetuning

at Anthropic

📍 San Francisco, United States

USD 315,000-340,000 per year

MIDDLE

✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Python @ 3 Machine Learning @ 3 Communication @ 3 Experimentation @ 3

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.

Role overview

As a Research Scientist/Engineer on the Alignment Finetuning team at Anthropic, you'll lead the development and implementation of techniques aimed at training language models that are more aligned with human values: demonstrating better moral reasoning, improved honesty, and good character. You'll develop novel finetuning techniques and use these to demonstrably improve model behavior.

Note: Interviews for this role are conducted in Python. This posting is an expression of interest; headcount for 2025 is filled and applications will be reviewed as the team grows.

Responsibilities

Develop and implement novel finetuning techniques using synthetic data generation and advanced training pipelines
Train models to have improved alignment properties (honesty, character, harmlessness)
Create and maintain evaluation frameworks to measure alignment properties in models
Collaborate across teams to integrate alignment improvements into production models
Develop processes to help automate and scale the team's work

Requirements

MS/PhD in Computer Science, Machine Learning, or a related field, or equivalent experience (minimum: Bachelor's degree or equivalent experience required)
Strong programming skills, especially in Python (interviews are conducted in Python)
Experience with ML model training and experimentation
Track record of implementing ML research and turning research ideas into working code
Strong analytical skills for interpreting experimental results
Experience with ML metrics and evaluation frameworks
Ability to identify and resolve practical implementation challenges

Strong candidates may also have

Experience with language model finetuning
Background in AI alignment research
Published work in ML or alignment
Experience with synthetic data generation
Familiarity with techniques like RLHF, Constitutional AI, and reward modeling
Track record of designing and implementing novel training approaches
Experience with model behavior evaluation and improvement

Compensation

Annual base salary: $315,000 - $340,000 USD
Total compensation may include equity, benefits, and incentive compensation

Logistics & Benefits

Location: San Francisco, CA (Anthropic headquarters)
Location-based hybrid policy: staff expected to be in one of Anthropic's offices at least 25% of the time
Visa sponsorship: Anthropic does sponsor visas where feasible and retains immigration counsel to assist
Benefits mentioned: competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, office space for collaboration

How to apply / Additional information

Applicants are encouraged to apply even if they don't meet every qualification
The company emphasizes collaborative, large-scale empirical AI research and values communication skills
Candidate AI usage guidance and application policies are provided by Anthropic and linked in the original posting