Used Tools & Technologies
Not specified
Required Skills & Competences ?
Python @ 2 Algorithms @ 3 Communication @ 6Details
Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The Interpretability team focuses on mechanistic interpretability: reverse-engineering how neural network parameters map to meaningful algorithms. The team studies neural network components (neurons, attention heads, features, circuits) and builds tools and experiments—ranging from toy models to large-scale analyses—to discover mechanisms that explain model behavior and improve model safety.
Responsibilities
- Develop methods for understanding large language models (LLMs) by reverse engineering algorithms learned in model weights
- Design and run robust experiments both in toy scenarios and at scale in large models
- Create and analyze interpretability features and circuits to better understand how models compute
- Build infrastructure for running experiments and visualizing results
- Communicate results clearly with colleagues and publicly (writing up results, presentations)
Requirements
- Strong track record of scientific research (in any field); some prior work on interpretability is expected
- Comfortable with messy experimental science and iterative investigation
- Research and engineering skills: every team member writes code, designs and runs experiments, and interprets results
- Familiarity with Python is required
- Education: at least a Bachelor's degree in a related field or equivalent experience
Role-specific location policy
- This role is based in the San Francisco office; Anthropic is open to considering exceptional candidates for remote work on a case-by-case basis
- Location-based hybrid policy: currently, staff are expected to be in one of Anthropic's offices at least 25% of the time
Compensation
- Annual Salary: $315,000 - $560,000 USD
- Total compensation package for full-time employees includes equity, benefits, and may include incentive compensation
Logistics
- Visa sponsorship: Anthropic does sponsor visas for some roles and will make reasonable efforts to assist when an offer is made
- Applicants are encouraged to apply even if they do not meet every qualification listed
How we work / Culture
- The team values collaborative, high-impact research that treats AI research as an empirical, large-scale scientific effort
- Strong communication skills and willingness to publish and share research are emphasized
Additional notes
- The role references work and publications related to transformers, transformer circuits, and mechanistic interpretability. Candidates should be prepared to engage with both theory and engineering aspects of interpretability research.