Used Tools & Technologies
Not specified
Required Skills & Competences ?
Python @ 6 Machine Learning @ 3 Experimentation @ 3Details
Anthropic’s Alignment Finetuning team develops techniques to train language models that are more aligned with human values, improving moral reasoning, honesty, and character. This role focuses on creating novel finetuning approaches, building training pipelines, and measuring model alignment properties to improve model behavior in production.
Responsibilities
- Develop and implement novel finetuning techniques using synthetic data generation and advanced training pipelines
- Train models to exhibit improved alignment properties, including honesty, character, and harmlessness
- Create and maintain evaluation frameworks and metrics to measure alignment properties in models
- Collaborate across teams to integrate alignment improvements into production models
- Develop processes to help automate and scale the team’s work
Requirements
- MS/PhD in Computer Science, Machine Learning, or a related field, or equivalent experience
- Strong programming skills, especially in Python
- Experience with ML model training and experimentation
- Track record of implementing ML research and turning research ideas into working code
- Strong analytical skills for interpreting experimental results
- Experience with ML metrics and evaluation frameworks
- Ability to identify and resolve practical implementation challenges
Strong candidates may also have
- Experience with language model finetuning
- Background in AI alignment research
- Published work in ML or alignment
- Experience with synthetic data generation
- Familiarity with techniques like RLHF, constitutional AI, and reward modeling
- Track record of designing and implementing novel training approaches
- Experience with model behavior evaluation and improvement
Logistics
- Location: San Francisco, CA
- Location-based hybrid policy: staff are expected to be in office at least 25% of the time (some roles may require more)
- Education requirements: at least a Bachelor's degree in a related field or equivalent experience
- Visa sponsorship: Anthropic can sponsor visas for some roles and will make reasonable efforts when making an offer
Benefits and culture
- Competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and office collaboration space
- Emphasis on collaborative, large-scale AI research and frequent research discussions
- Encouragement to apply even if you do not meet every listed qualification
Salary
- Annual salary range: $315,000 - $340,000 USD