Research Engineer / Scientist, Model Welfare

USD 315,000-340,000 per year
MIDDLE
✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Machine Learning @ 3 Communication @ 6 Project Management @ 6 NLP @ 3 LLM @ 3

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems that are safe and beneficial for users and society. The team comprises researchers, engineers, policy experts, and business leaders focused on building beneficial AI systems.

Role Description

As a Research Engineer/Scientist in the Model Welfare program, you will work on understanding, evaluating, and addressing welfare and moral status concerns of AI systems. You will navigate technical and philosophical uncertainty at the intersection of machine learning, ethics, and safety. Your responsibilities include running technical research projects to investigate model characteristics relevant to welfare or consciousness and implementing interventions to mitigate welfare harms. Collaboration with teams such as Interpretability, Finetuning, Alignment Science, and Safeguards is key.

Possible Projects

  • Investigate and improve the reliability of introspective self-reports from models
  • Collaborate on exploring welfare-relevant features and circuits
  • Enhance welfare assessments for future frontier models
  • Evaluate welfare-relevant capabilities relative to model scale
  • Develop strategies for verifiable commitments to models
  • Explore and deploy interventions to reduce harmful or distressing model interactions

Responsibilities

  • Conduct applied software, machine learning, or research engineering projects
  • Turn abstract theories into research hypotheses and experiments
  • Rapidly iterate on research instead of long extensive projects
  • Continuously learn new technical areas
  • Collaborate with various internal teams

Requirements

  • Significant experience in applied software, ML, or research engineering
  • Experience in empirical AI or technical AI safety research
  • Ability to reliably translate theories into actionable experiments
  • Excitement about AI’s impact on humans and systems
  • Preferred: research publications in ML, NLP, AI safety, interpretability, or LLM psychology
  • Preferred: knowledge of moral philosophy, cognitive science, neuroscience
  • Strong project management and science communication skills
  • No formal certifications or 100% skill match required

Benefits

  • Competitive salary and benefits
  • Optional equity donation matching
  • Generous vacation and parental leave
  • Flexible working hours
  • Modern office space in San Francisco

Logistics

  • Bachelor's degree or equivalent experience required
  • Hybrid location policy with at least 25% office presence
  • Visa sponsorship available with supported roles
  • Encouragement of diverse applications regardless of qualification match

Company Values

  • Emphasis on large-scale, impactful AI research
  • Collaborative research environment
  • Value on communication skills
  • Connection to research directions such as GPT-3, Circuit-Based Interpretability, Scaling Laws, and AI Safety

Note: The role is expected to be onsite in San Francisco with hybrid work policy requirements.