Research Scientist, Interpretability

at Anthropic

📍 San Francisco, United States

USD 315,000-560,000 per year

MIDDLE

✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Go @ 3 Python @ 2 Algorithms @ 3 Communication @ 3

Details

Anthropic’s Interpretability team is focused on mechanistic interpretability: reverse-engineering how trained neural networks implement algorithms and discovering how neural network parameters map to meaningful computations. The team works across toy models and large production language models to find features, build circuits, and develop tools that improve our mechanistic understanding of LLMs and make models safer. Team work is highly collaborative and combines research and engineering—everyone writes code, designs and runs experiments, and interprets results.

Responsibilities

Develop methods for understanding large language models by reverse engineering algorithms learned in their weights
Design and run robust experiments, both quickly in toy scenarios and at scale in large models
Create and analyze interpretability features and circuits to better understand model computation
Build infrastructure for running experiments and visualizing results
Communicate results clearly with colleagues internally and publicly

Requirements

Strong track record of scientific research (in any field); some prior work on interpretability is expected
Comfortable with messy, experimental research and inventing methods as you go
Able to combine research and engineering: write code, design and run experiments, and interpret results
Able to clearly articulate motivations and findings and write up results (including null results)
Familiarity with Python (required)

For guidance on preparing for this role, the team points to their blog post "So You Want to Work in Mechanistic Interpretability?" and several publications and posts describing their research directions and methods.

Role-specific location policy

This role is based in the San Francisco office; exceptional candidates may be considered for remote work on a case-by-case basis.
The organization currently expects staff to be in an office at least ~25% of the time; some roles may require more on-site presence.

Logistics

Education requirements: At least a Bachelor's degree in a related field or equivalent experience
Visa sponsorship: Anthropic does sponsor visas where feasible and retains immigration counsel to assist when an offer is made

Compensation

Annual salary range: $315,000 - $560,000 USD

Benefits & culture

Competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and an office in San Francisco
Collaborative, large-scale research approach valuing high-impact science and clear communication

How we're different

Anthropic emphasizes "big science" AI research working as a single cohesive team on a few large-scale research efforts. The group values empirical approaches and cross-disciplinary thinking, drawing on ideas from physics, biology, and computer science.