Research Scientist, Interpretability

USD 315,000-560,000 per year
MIDDLE
✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Go @ 3 Python @ 2 Algorithms @ 3 Communication @ 3

Details

Anthropic’s Interpretability team is focused on mechanistic interpretability: reverse-engineering how trained neural networks implement algorithms and discovering how neural network parameters map to meaningful computations. The team works across toy models and large production language models to find features, build circuits, and develop tools that improve our mechanistic understanding of LLMs and make models safer. Team work is highly collaborative and combines research and engineering—everyone writes code, designs and runs experiments, and interprets results.

Responsibilities

  • Develop methods for understanding large language models by reverse engineering algorithms learned in their weights
  • Design and run robust experiments, both quickly in toy scenarios and at scale in large models
  • Create and analyze interpretability features and circuits to better understand model computation
  • Build infrastructure for running experiments and visualizing results
  • Communicate results clearly with colleagues internally and publicly

Requirements

  • Strong track record of scientific research (in any field); some prior work on interpretability is expected
  • Comfortable with messy, experimental research and inventing methods as you go
  • Able to combine research and engineering: write code, design and run experiments, and interpret results
  • Able to clearly articulate motivations and findings and write up results (including null results)
  • Familiarity with Python (required)

For guidance on preparing for this role, the team points to their blog post "So You Want to Work in Mechanistic Interpretability?" and several publications and posts describing their research directions and methods.

Role-specific location policy

  • This role is based in the San Francisco office; exceptional candidates may be considered for remote work on a case-by-case basis.
  • The organization currently expects staff to be in an office at least ~25% of the time; some roles may require more on-site presence.

Logistics

  • Education requirements: At least a Bachelor's degree in a related field or equivalent experience
  • Visa sponsorship: Anthropic does sponsor visas where feasible and retains immigration counsel to assist when an offer is made

Compensation

  • Annual salary range: $315,000 - $560,000 USD

Benefits & culture

  • Competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and an office in San Francisco
  • Collaborative, large-scale research approach valuing high-impact science and clear communication

How we're different

  • Anthropic emphasizes "big science" AI research working as a single cohesive team on a few large-scale research efforts. The group values empirical approaches and cross-disciplinary thinking, drawing on ideas from physics, biology, and computer science.