Research Scientist/Engineer, Alignment Finetuning

USD 315,000-340,000 per year
MIDDLE
✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Python @ 3 Machine Learning @ 3 Communication @ 3 Experimentation @ 3

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.

Role overview

As a Research Scientist/Engineer on the Alignment Finetuning team at Anthropic, you'll lead the development and implementation of techniques aimed at training language models that are more aligned with human values: demonstrating better moral reasoning, improved honesty, and good character. You'll develop novel finetuning techniques and use these to demonstrably improve model behavior.

Note: Interviews for this role are conducted in Python. This posting is an expression of interest; headcount for 2025 is filled and applications will be reviewed as the team grows.

Responsibilities

  • Develop and implement novel finetuning techniques using synthetic data generation and advanced training pipelines
  • Train models to have improved alignment properties (honesty, character, harmlessness)
  • Create and maintain evaluation frameworks to measure alignment properties in models
  • Collaborate across teams to integrate alignment improvements into production models
  • Develop processes to help automate and scale the team's work

Requirements

  • MS/PhD in Computer Science, Machine Learning, or a related field, or equivalent experience (minimum: Bachelor's degree or equivalent experience required)
  • Strong programming skills, especially in Python (interviews are conducted in Python)
  • Experience with ML model training and experimentation
  • Track record of implementing ML research and turning research ideas into working code
  • Strong analytical skills for interpreting experimental results
  • Experience with ML metrics and evaluation frameworks
  • Ability to identify and resolve practical implementation challenges

Strong candidates may also have

  • Experience with language model finetuning
  • Background in AI alignment research
  • Published work in ML or alignment
  • Experience with synthetic data generation
  • Familiarity with techniques like RLHF, Constitutional AI, and reward modeling
  • Track record of designing and implementing novel training approaches
  • Experience with model behavior evaluation and improvement

Compensation

  • Annual base salary: $315,000 - $340,000 USD
  • Total compensation may include equity, benefits, and incentive compensation

Logistics & Benefits

  • Location: San Francisco, CA (Anthropic headquarters)
  • Location-based hybrid policy: staff expected to be in one of Anthropic's offices at least 25% of the time
  • Visa sponsorship: Anthropic does sponsor visas where feasible and retains immigration counsel to assist
  • Benefits mentioned: competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, office space for collaboration

How to apply / Additional information

  • Applicants are encouraged to apply even if they don't meet every qualification
  • The company emphasizes collaborative, large-scale empirical AI research and values communication skills
  • Candidate AI usage guidance and application policies are provided by Anthropic and linked in the original posting