Research Scientist, Interpretability

at Anthropic

📍 San Francisco, United States

USD 315,000-560,000 per year

MIDDLE

✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Python @ 2 Algorithms @ 3 Communication @ 6

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The Interpretability team focuses on mechanistic interpretability: reverse-engineering how neural network parameters map to meaningful algorithms. The team studies neural network components (neurons, attention heads, features, circuits) and builds tools and experiments—ranging from toy models to large-scale analyses—to discover mechanisms that explain model behavior and improve model safety.

Responsibilities

Develop methods for understanding large language models (LLMs) by reverse engineering algorithms learned in model weights
Design and run robust experiments both in toy scenarios and at scale in large models
Create and analyze interpretability features and circuits to better understand how models compute
Build infrastructure for running experiments and visualizing results
Communicate results clearly with colleagues and publicly (writing up results, presentations)

Requirements

Strong track record of scientific research (in any field); some prior work on interpretability is expected
Comfortable with messy experimental science and iterative investigation
Research and engineering skills: every team member writes code, designs and runs experiments, and interprets results
Familiarity with Python is required
Education: at least a Bachelor's degree in a related field or equivalent experience

Role-specific location policy

This role is based in the San Francisco office; Anthropic is open to considering exceptional candidates for remote work on a case-by-case basis
Location-based hybrid policy: currently, staff are expected to be in one of Anthropic's offices at least 25% of the time

Compensation

Annual Salary: $315,000 - $560,000 USD
Total compensation package for full-time employees includes equity, benefits, and may include incentive compensation

Logistics

Visa sponsorship: Anthropic does sponsor visas for some roles and will make reasonable efforts to assist when an offer is made
Applicants are encouraged to apply even if they do not meet every qualification listed

How we work / Culture

The team values collaborative, high-impact research that treats AI research as an empirical, large-scale scientific effort
Strong communication skills and willingness to publish and share research are emphasized

Additional notes

The role references work and publications related to transformers, transformer circuits, and mechanistic interpretability. Candidates should be prepared to engage with both theory and engineering aspects of interpretability research.