Research Engineer, Interpretability

USD 315,000-560,000 per year
MIDDLE
✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Go @ 5 Python @ 5 Java @ 5 GitHub @ 3 Algorithms @ 3 Distributed Systems @ 3 Machine Learning @ 3 Communication @ 3 Rust @ 5 Experimentation @ 3 LLM @ 3 PyTorch @ 3

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The Interpretability team focuses on mechanistic interpretability — discovering how neural network parameters map to meaningful algorithms — and builds tools and infrastructure to reverse-engineer how trained models work. Team work spans empirical research, tooling, large-scale data pipelines, and collaborations across Anthropic to improve model safety.

Responsibilities

  • Implement and analyze research experiments, both quickly in toy scenarios and at scale in large models
  • Set up and optimize research workflows to run efficiently and reliably at large scale
  • Build tools and abstractions to support a rapid pace of research experimentation
  • Develop and improve tools and infrastructure to support other teams in using Interpretability’s work to improve model safety

Requirements

  • 5–10+ years of experience building software
  • Highly proficient in at least one programming language (examples listed: Python, Rust, Go, Java) and productive with Python
  • Some experience contributing to empirical AI research projects
  • Strong ability to prioritize and direct effort toward the most impactful work; comfortable operating with ambiguity and questioning assumptions
  • Prefer collaborative, fast-moving projects
  • Interest in learning machine learning research and collaborating closely with researchers
  • Care about societal impacts and ethics of work
  • Education: at least a Bachelor’s degree in a related field or equivalent experience

Strong candidates may also have experience with

  • Designing a codebase so others can quickly run experiments, launch them, and analyze results without hitting bugs
  • Optimizing performance of large-scale distributed systems
  • Collaborating closely with researchers
  • Language modeling with transformers
  • GPUs or PyTorch

Representative projects / examples of work

  • Building Garcon: a tool that allows researchers to easily access LLM internals from a Jupyter notebook
  • Setting up and optimizing a pipeline to collect petabytes of transformer activations and shuffle them
  • Profiling and optimizing ML training, including parallelizing to many GPUs
  • Making launching ML experiments and analyzing results fast and easy
  • Creating interactive visualizations of attention between tokens in a language model

Location & Office Policy

  • This role is based in the San Francisco office; exceptional candidates may be considered for remote work on a case-by-case basis.
  • Currently, staff are expected to be in one of Anthropic’s offices at least 25% of the time. Some roles may require more in-office time.

Compensation

  • Annual base salary range: $315,000 - $560,000 USD
  • Total compensation for full-time employees includes equity, benefits, and may include incentive compensation

Logistics

  • Visa sponsorship: Anthropic does sponsor visas where feasible and retains immigration counsel to assist where possible
  • Application materials: resume/CV or LinkedIn profile required; optional cover letter and links (GitHub, publications)

Why Anthropic / Culture

  • Team values large-scale, high-impact empirical AI research and collaborative work across disciplines
  • Emphasis on communication, safety, and societal/ethical implications of AI

How to apply

  • Submit application via Anthropic’s careers page. Anthropic encourages applicants from diverse backgrounds and those who may not meet every listed qualification to apply.