Research Scientist, Interpretability

at Anthropic

📍 San Francisco, United States

USD 315,000-560,000 per year

MIDDLE

✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Python @ 2 Algorithms @ 3 Communication @ 6

Details

Anthropic is on a mission to create reliable, interpretable, and steerable AI systems that are safe and beneficial for users and society. The Interpretability team focuses on mechanistic interpretability, which aims to uncover how neural network parameters correspond to meaningful algorithms, akin to a "biology" or "neuroscience" of neural networks using custom-built tools.

Responsibilities

Develop methods to understand large language models (LLMs) by reverse engineering the algorithms encoded in their weights.
Design and execute robust experiments both in toy models and at scale in large models.
Create and analyze new interpretability features and circuits to improve understanding of model behaviors.
Build infrastructure for conducting experiments and visualizing results.
Collaborate internally and communicate research findings publicly.

Requirements

Strong track record of scientific research in any field with some experience in interpretability.
Enjoy working collaboratively as part of a scientific team.
Comfort with experimental science that is exploratory and developing the field.
Ability to combine research and engineering by writing code, running experiments, and interpreting results.
Strong communication skills for articulating, discussing, and documenting research results.
Familiarity with Python is required.

Benefits

Competitive compensation and benefits.
Optional equity donation matching.
Generous vacation and parental leave.
Flexible working hours.
Collaborative office environment located in San Francisco.

Role Specific Location Policy

This role is based in the San Francisco office with openness to exceptional remote candidates on a case-by-case basis.

Additional Info

Education: At least a Bachelor's degree in a related field or equivalent experience.
Hybrid policy: Expectation to be in office at least 25% of time.
Visa sponsorship is available with reasonable efforts from the company.
Commitment to diversity and inclusion and encouragement to apply even if not meeting every qualification.
Emphasis on big science collaborative AI research aligned with long-term goals of safe and trustworthy AI.