Research Engineer, Interpretability

at Anthropic

📍 San Francisco, United States

USD 315,000-560,000 per year

MIDDLE

✅ Remote ✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Go @ 5 Python @ 5 Java @ 5 GitHub @ 3 Algorithms @ 3 Distributed Systems @ 3 Machine Learning @ 3 Rust @ 5 Experimentation @ 3 LLM @ 3 PyTorch @ 3

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The Interpretability team is focused on mechanistic interpretability: reverse-engineering how neural network parameters map to meaningful algorithms. The team builds tools and "microscopes" for neural networks, treats models as programs to reverse-engineer, and collaborates across Anthropic (e.g., Alignment Science, Pretraining). Representative publications and resources are linked in the original posting.

Responsibilities

Implement and analyze research experiments, both quickly in toy scenarios and at scale in large models.
Set up and optimize research workflows to run efficiently and reliably at large scale.
Build tools and abstractions to support a rapid pace of research experimentation.
Develop and improve tools and infrastructure to support other teams in using Interpretability’s work to improve model safety.

Requirements

5–10+ years of experience building software.
Highly proficient in at least one programming language (examples listed: Python, Rust, Go, Java) and productive with Python.
Some experience contributing to empirical AI research projects.
Strong ability to prioritize and direct effort toward the most impactful work; comfortable operating with ambiguity and questioning assumptions.
Prefer fast-moving collaborative projects to extensive solo efforts; want to learn more about machine learning research and collaborate closely with researchers.
Care about societal impacts and ethics of your work.
Education: at least a Bachelor's degree in a related field or equivalent experience.

Strong candidates (preferred / nice-to-have)

Designing a code base so anyone can quickly code experiments, launch them, and analyze results with few bugs.
Optimizing performance of large-scale distributed systems.
Collaborating closely with researchers.
Language modeling with transformers.
Experience with GPUs or PyTorch.

Representative projects (examples of past or typical work)

Building Garcon, a tool that allows researchers to easily access LLM internals from a Jupyter notebook.
Setting up and optimizing a pipeline to efficiently collect petabytes of transformer activations and shuffle them.
Profiling and optimizing ML training, including parallelizing to many GPUs.
Making launching ML experiments and manipulating+analyzing the results fast and easy.
Creating an interactive visualization of attention between tokens in a language model.

Location & Office Policy

This role is based in the San Francisco office; Anthropic is open to considering exceptional candidates for remote work on a case-by-case basis.
The company expects staff to be in one of its offices at least 25% of the time (location-based hybrid policy), though some roles may require more time in offices.

Compensation

Expected base annual salary: $315,000 - $560,000 USD.
Total compensation package for full-time employees includes equity, benefits, and may include incentive compensation.

Logistics & Other Information

Visa sponsorship: Anthropic does sponsor visas but cannot guarantee sponsorship for every role/candidate; they retain an immigration lawyer to help when an offer is made.
Encouragement to apply even if you do not meet every qualification; strong emphasis on diversity and inclusion.
Candidate guidance on using AI in the application process is provided via a linked policy.

How to Apply / Additional Application Details

The posting includes an application form with fields for resume/CV, GitHub, publications, written prompts about fit and past work, earliest start date, visa questions, and other standard application items.

(Original posting contained multiple links to team resources, publications, and further reading.)