Research Engineer, Interpretability

at Anthropic

📍 San Francisco, United States

USD 315,000-560,000 per year

MIDDLE SENIOR

✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Go @ 5 Python @ 5 Java @ 5 Algorithms @ 3 Distributed Systems @ 3 Machine Learning @ 3 Communication @ 3 Rust @ 5 Experimentation @ 3 LLM @ 3 PyTorch @ 3

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The Interpretability team focuses on mechanistic interpretability — discovering how neural network parameters map to meaningful algorithms — to build a foundation for understanding and making models safe. The team collaborates with other groups across Anthropic (for example, Alignment Science and Pretraining) and publishes research on subjects such as transformer circuits, superposition, and feature discovery.

Responsibilities

Implement and analyze research experiments, both quickly in toy scenarios and at scale in large models.
Set up and optimize research workflows to run efficiently and reliably at large scale.
Build tools and abstractions to support a rapid pace of research experimentation.
Develop and improve tools and infrastructure to help other teams use Interpretability’s work to improve model safety.

Requirements

5–10+ years of experience building software.
Highly proficient in at least one programming language (examples given: Python, Rust, Go, Java) and productive with Python.
Some experience contributing to empirical AI research projects.
Strong ability to prioritize and direct effort toward the most impactful work; comfortable operating with ambiguity and questioning assumptions.
Prefer fast-moving collaborative projects to extensive solo efforts.
Interest in machine learning research and close collaboration with researchers.
Care about societal impacts and ethics of your work.

Strong candidates may also have experience with

Designing a codebase so that anyone can quickly code experiments, launch them, and analyze results without hitting bugs.
Optimizing the performance of large-scale distributed systems.
Collaborating closely with researchers.
Language modeling with transformers.
GPUs or PyTorch.

Representative projects

Building Garcon, a tool that lets researchers access LLM internals from a Jupyter notebook.
Setting up and optimizing a pipeline to efficiently collect petabytes of transformer activations and shuffle them.
Profiling and optimizing ML training, including parallelizing to many GPUs.
Making launching ML experiments and manipulating+analyzing the results fast and easy.
Creating interactive visualizations of attention between tokens in a language model.

Logistics

Role location: This role is based in the San Francisco office; exceptional candidates may be considered for remote work on a case-by-case basis.
Location-based hybrid policy: staff are expected to be in one of Anthropic’s offices at least 25% of the time; some roles may require more time in office.
Education: At least a Bachelor's degree in a related field or equivalent experience is required.
Visa sponsorship: Anthropic does sponsor visas in many cases and retains an immigration lawyer to assist when an offer is made.

Compensation & Benefits

Annual salary range: $315,000 - $560,000 USD.
Anthropic offers competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and an office space for collaboration.

How we work

Anthropic emphasizes large-scale, collaborative research efforts with a strong focus on impact and empirical science. The team values communication and frequent research discussions.

Notes

The role centers on mechanistic interpretability and building production-ready tools and infrastructure to support interpretability research at scale. Technologies and practices explicitly mentioned in the posting (e.g., Python, PyTorch, transformers, GPUs, distributed systems, Jupyter) are relevant for the role.