Used Tools & Technologies
Not specified
Required Skills & Competences ?
Python @ 3 Machine Learning @ 3 Communication @ 3 Experimentation @ 3Details
Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.
Role overview
As a Research Scientist/Engineer on the Alignment Finetuning team at Anthropic, you'll lead the development and implementation of techniques aimed at training language models that are more aligned with human values: demonstrating better moral reasoning, improved honesty, and good character. You'll develop novel finetuning techniques and use these to demonstrably improve model behavior.
Note: Interviews for this role are conducted in Python. This posting is an expression of interest; headcount for 2025 is filled and applications will be reviewed as the team grows.
Responsibilities
- Develop and implement novel finetuning techniques using synthetic data generation and advanced training pipelines
- Train models to have improved alignment properties (honesty, character, harmlessness)
- Create and maintain evaluation frameworks to measure alignment properties in models
- Collaborate across teams to integrate alignment improvements into production models
- Develop processes to help automate and scale the team's work
Requirements
- MS/PhD in Computer Science, Machine Learning, or a related field, or equivalent experience (minimum: Bachelor's degree or equivalent experience required)
- Strong programming skills, especially in Python (interviews are conducted in Python)
- Experience with ML model training and experimentation
- Track record of implementing ML research and turning research ideas into working code
- Strong analytical skills for interpreting experimental results
- Experience with ML metrics and evaluation frameworks
- Ability to identify and resolve practical implementation challenges
Strong candidates may also have
- Experience with language model finetuning
- Background in AI alignment research
- Published work in ML or alignment
- Experience with synthetic data generation
- Familiarity with techniques like RLHF, Constitutional AI, and reward modeling
- Track record of designing and implementing novel training approaches
- Experience with model behavior evaluation and improvement
Compensation
- Annual base salary: $315,000 - $340,000 USD
- Total compensation may include equity, benefits, and incentive compensation
Logistics & Benefits
- Location: San Francisco, CA (Anthropic headquarters)
- Location-based hybrid policy: staff expected to be in one of Anthropic's offices at least 25% of the time
- Visa sponsorship: Anthropic does sponsor visas where feasible and retains immigration counsel to assist
- Benefits mentioned: competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, office space for collaboration
How to apply / Additional information
- Applicants are encouraged to apply even if they don't meet every qualification
- The company emphasizes collaborative, large-scale empirical AI research and values communication skills
- Candidate AI usage guidance and application policies are provided by Anthropic and linked in the original posting