Research Engineer, Interpretability

at Anthropic

📍 San Francisco, United States

USD 315,000-560,000 per year

MIDDLE

✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Go @ 5 Python @ 5 Java @ 5 GitHub @ 3 Algorithms @ 3 Distributed Systems @ 3 Machine Learning @ 3 Communication @ 3 Rust @ 5 Experimentation @ 3 LLM @ 3 PyTorch @ 3

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The Interpretability team focuses on mechanistic interpretability — discovering how neural network parameters map to meaningful algorithms — and builds tools and infrastructure to reverse-engineer how trained models work. Team work spans empirical research, tooling, large-scale data pipelines, and collaborations across Anthropic to improve model safety.

Responsibilities

Implement and analyze research experiments, both quickly in toy scenarios and at scale in large models
Set up and optimize research workflows to run efficiently and reliably at large scale
Build tools and abstractions to support a rapid pace of research experimentation
Develop and improve tools and infrastructure to support other teams in using Interpretability’s work to improve model safety

Requirements

5–10+ years of experience building software
Highly proficient in at least one programming language (examples listed: Python, Rust, Go, Java) and productive with Python
Some experience contributing to empirical AI research projects
Strong ability to prioritize and direct effort toward the most impactful work; comfortable operating with ambiguity and questioning assumptions
Prefer collaborative, fast-moving projects
Interest in learning machine learning research and collaborating closely with researchers
Care about societal impacts and ethics of work
Education: at least a Bachelor’s degree in a related field or equivalent experience

Strong candidates may also have experience with

Designing a codebase so others can quickly run experiments, launch them, and analyze results without hitting bugs
Optimizing performance of large-scale distributed systems
Collaborating closely with researchers
Language modeling with transformers
GPUs or PyTorch

Representative projects / examples of work

Building Garcon: a tool that allows researchers to easily access LLM internals from a Jupyter notebook
Setting up and optimizing a pipeline to collect petabytes of transformer activations and shuffle them
Profiling and optimizing ML training, including parallelizing to many GPUs
Making launching ML experiments and analyzing results fast and easy
Creating interactive visualizations of attention between tokens in a language model

Location & Office Policy

This role is based in the San Francisco office; exceptional candidates may be considered for remote work on a case-by-case basis.
Currently, staff are expected to be in one of Anthropic’s offices at least 25% of the time. Some roles may require more in-office time.

Compensation

Annual base salary range: $315,000 - $560,000 USD
Total compensation for full-time employees includes equity, benefits, and may include incentive compensation

Logistics

Visa sponsorship: Anthropic does sponsor visas where feasible and retains immigration counsel to assist where possible
Application materials: resume/CV or LinkedIn profile required; optional cover letter and links (GitHub, publications)

Why Anthropic / Culture

Team values large-scale, high-impact empirical AI research and collaborative work across disciplines
Emphasis on communication, safety, and societal/ethical implications of AI

How to apply

Submit application via Anthropic’s careers page. Anthropic encourages applicants from diverse backgrounds and those who may not meet every listed qualification to apply.