Research Scientist/Engineer, Alignment Finetuning

at Anthropic

📍 San Francisco, United States

USD 315,000-340,000 per year

MIDDLE

✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Python @ 6 Machine Learning @ 3 Experimentation @ 3

Details

Anthropic’s Alignment Finetuning team develops techniques to train language models that are more aligned with human values, improving moral reasoning, honesty, and character. This role focuses on creating novel finetuning approaches, building training pipelines, and measuring model alignment properties to improve model behavior in production.

Responsibilities

Develop and implement novel finetuning techniques using synthetic data generation and advanced training pipelines
Train models to exhibit improved alignment properties, including honesty, character, and harmlessness
Create and maintain evaluation frameworks and metrics to measure alignment properties in models
Collaborate across teams to integrate alignment improvements into production models
Develop processes to help automate and scale the team’s work

Requirements

MS/PhD in Computer Science, Machine Learning, or a related field, or equivalent experience
Strong programming skills, especially in Python
Experience with ML model training and experimentation
Track record of implementing ML research and turning research ideas into working code
Strong analytical skills for interpreting experimental results
Experience with ML metrics and evaluation frameworks
Ability to identify and resolve practical implementation challenges

Strong candidates may also have

Experience with language model finetuning
Background in AI alignment research
Published work in ML or alignment
Experience with synthetic data generation
Familiarity with techniques like RLHF, constitutional AI, and reward modeling
Track record of designing and implementing novel training approaches
Experience with model behavior evaluation and improvement

Logistics

Location: San Francisco, CA
Location-based hybrid policy: staff are expected to be in office at least 25% of the time (some roles may require more)
Education requirements: at least a Bachelor's degree in a related field or equivalent experience
Visa sponsorship: Anthropic can sponsor visas for some roles and will make reasonable efforts when making an offer

Benefits and culture

Competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and office collaboration space
Emphasis on collaborative, large-scale AI research and frequent research discussions
Encouragement to apply even if you do not meet every listed qualification

Salary

Annual salary range: $315,000 - $340,000 USD