Used Tools & Technologies
Not specified
Required Skills & Competences ?
Python @ 2 Algorithms @ 3 Data Analysis @ 3Details
Anthropic’s Interpretability team is seeking researchers and engineers to reverse-engineer how language models work, with a focus on mechanistic interpretability — discovering how neural network parameters implement meaningful algorithms. The team combines experimental research, engineering, and collaboration across Anthropic to build tools, run experiments at scale, and produce mechanistic accounts of model behavior.
Responsibilities
- Develop methods to understand large language models by reverse engineering algorithms learned in weights
- Design and run robust experiments, both quickly in toy scenarios and at scale in large models
- Create and analyze interpretability features and circuits to understand model computation
- Build infrastructure for running experiments and visualizing results
- Communicate results clearly with colleagues and publicly (writing up findings, preparing visualizations and documentation)
Requirements
- Strong track record of scientific research (in any field); some prior work on interpretability is expected
- Familiarity with Python is required
- Experience designing and running experiments and analyzing results (toy-scale and large-scale)
- Comfortable with messy, exploratory experimental science and collaborative team research
- Ability to write code, build experiment infrastructure, and perform data analysis and visualization
- Ability to communicate research results clearly in writing and presentations; publications or public research outputs are requested
- Education: at least a Bachelor's degree in a related field or equivalent experience
Role location & policy
- Role is based in the San Francisco office (San Francisco, CA). Anthropic is open to considering exceptional candidates for remote work on a case-by-case basis.
- Currently, staff are expected to be in one of Anthropic's offices at least ~25% of the time (location-based hybrid policy)
- Visa sponsorship: Anthropic does sponsor visas in many cases and retains an immigration lawyer to assist when an offer is made
Compensation
- Expected base annual salary range: $315,000 - $560,000 USD (total compensation may include equity, benefits, and incentive compensation)
Nice-to-have / team fit
- Interest in mechanistic interpretability, circuits, and transformer analysis
- Enjoys collaborative, team-focused science and frequent research discussions
- Willingness to write up and share findings publicly, including null results
How to apply / logistics
- Applicants are asked to provide publications/research outputs (e.g., Google Scholar, Semantic Scholar). If you do not have publications, consider applying to a Research Engineer role instead.
- Anthropic encourages applications from candidates who may not meet every listed qualification and values diverse perspectives.