Research Engineer / Scientist, Model Welfare

at Anthropic

📍 San Francisco, United States

USD 315,000-340,000 per year

MIDDLE

✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Machine Learning @ 3 Communication @ 6 Project Management @ 6 NLP @ 3 LLM @ 3

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems that are safe and beneficial for users and society. The team comprises researchers, engineers, policy experts, and business leaders focused on building beneficial AI systems.

Role Description

As a Research Engineer/Scientist in the Model Welfare program, you will work on understanding, evaluating, and addressing welfare and moral status concerns of AI systems. You will navigate technical and philosophical uncertainty at the intersection of machine learning, ethics, and safety. Your responsibilities include running technical research projects to investigate model characteristics relevant to welfare or consciousness and implementing interventions to mitigate welfare harms. Collaboration with teams such as Interpretability, Finetuning, Alignment Science, and Safeguards is key.

Possible Projects

Investigate and improve the reliability of introspective self-reports from models
Collaborate on exploring welfare-relevant features and circuits
Enhance welfare assessments for future frontier models
Evaluate welfare-relevant capabilities relative to model scale
Develop strategies for verifiable commitments to models
Explore and deploy interventions to reduce harmful or distressing model interactions

Responsibilities

Conduct applied software, machine learning, or research engineering projects
Turn abstract theories into research hypotheses and experiments
Rapidly iterate on research instead of long extensive projects
Continuously learn new technical areas
Collaborate with various internal teams

Requirements

Significant experience in applied software, ML, or research engineering
Experience in empirical AI or technical AI safety research
Ability to reliably translate theories into actionable experiments
Excitement about AI’s impact on humans and systems
Preferred: research publications in ML, NLP, AI safety, interpretability, or LLM psychology
Preferred: knowledge of moral philosophy, cognitive science, neuroscience
Strong project management and science communication skills
No formal certifications or 100% skill match required

Benefits

Competitive salary and benefits
Optional equity donation matching
Generous vacation and parental leave
Flexible working hours
Modern office space in San Francisco

Logistics

Bachelor's degree or equivalent experience required
Hybrid location policy with at least 25% office presence
Visa sponsorship available with supported roles
Encouragement of diverse applications regardless of qualification match

Company Values

Emphasis on large-scale, impactful AI research
Collaborative research environment
Value on communication skills
Connection to research directions such as GPT-3, Circuit-Based Interpretability, Scaling Laws, and AI Safety

Note: The role is expected to be onsite in San Francisco with hybrid work policy requirements.