Research Scientist, Interpretability

USD 315,000-560,000 per year
MIDDLE
✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Python @ 2 Algorithms @ 3 Communication @ 6

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The Interpretability team focuses on mechanistic interpretability: reverse-engineering how neural network parameters map to meaningful algorithms. The team studies neural network components (neurons, attention heads, features, circuits) and builds tools and experiments—ranging from toy models to large-scale analyses—to discover mechanisms that explain model behavior and improve model safety.

Responsibilities

  • Develop methods for understanding large language models (LLMs) by reverse engineering algorithms learned in model weights
  • Design and run robust experiments both in toy scenarios and at scale in large models
  • Create and analyze interpretability features and circuits to better understand how models compute
  • Build infrastructure for running experiments and visualizing results
  • Communicate results clearly with colleagues and publicly (writing up results, presentations)

Requirements

  • Strong track record of scientific research (in any field); some prior work on interpretability is expected
  • Comfortable with messy experimental science and iterative investigation
  • Research and engineering skills: every team member writes code, designs and runs experiments, and interprets results
  • Familiarity with Python is required
  • Education: at least a Bachelor's degree in a related field or equivalent experience

Role-specific location policy

  • This role is based in the San Francisco office; Anthropic is open to considering exceptional candidates for remote work on a case-by-case basis
  • Location-based hybrid policy: currently, staff are expected to be in one of Anthropic's offices at least 25% of the time

Compensation

  • Annual Salary: $315,000 - $560,000 USD
  • Total compensation package for full-time employees includes equity, benefits, and may include incentive compensation

Logistics

  • Visa sponsorship: Anthropic does sponsor visas for some roles and will make reasonable efforts to assist when an offer is made
  • Applicants are encouraged to apply even if they do not meet every qualification listed

How we work / Culture

  • The team values collaborative, high-impact research that treats AI research as an empirical, large-scale scientific effort
  • Strong communication skills and willingness to publish and share research are emphasized

Additional notes

  • The role references work and publications related to transformers, transformer circuits, and mechanistic interpretability. Candidates should be prepared to engage with both theory and engineering aspects of interpretability research.