Research Engineer, Interpretability

USD 315,000-560,000 per year
MIDDLE
✅ Remote ✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Go @ 5 Python @ 5 Java @ 5 GitHub @ 3 Algorithms @ 3 Distributed Systems @ 3 Machine Learning @ 3 Rust @ 5 Experimentation @ 3 LLM @ 3 PyTorch @ 3

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The Interpretability team is focused on mechanistic interpretability: reverse-engineering how neural network parameters map to meaningful algorithms. The team builds tools and "microscopes" for neural networks, treats models as programs to reverse-engineer, and collaborates across Anthropic (e.g., Alignment Science, Pretraining). Representative publications and resources are linked in the original posting.

Responsibilities

  • Implement and analyze research experiments, both quickly in toy scenarios and at scale in large models.
  • Set up and optimize research workflows to run efficiently and reliably at large scale.
  • Build tools and abstractions to support a rapid pace of research experimentation.
  • Develop and improve tools and infrastructure to support other teams in using Interpretability’s work to improve model safety.

Requirements

  • 5–10+ years of experience building software.
  • Highly proficient in at least one programming language (examples listed: Python, Rust, Go, Java) and productive with Python.
  • Some experience contributing to empirical AI research projects.
  • Strong ability to prioritize and direct effort toward the most impactful work; comfortable operating with ambiguity and questioning assumptions.
  • Prefer fast-moving collaborative projects to extensive solo efforts; want to learn more about machine learning research and collaborate closely with researchers.
  • Care about societal impacts and ethics of your work.
  • Education: at least a Bachelor's degree in a related field or equivalent experience.

Strong candidates (preferred / nice-to-have)

  • Designing a code base so anyone can quickly code experiments, launch them, and analyze results with few bugs.
  • Optimizing performance of large-scale distributed systems.
  • Collaborating closely with researchers.
  • Language modeling with transformers.
  • Experience with GPUs or PyTorch.

Representative projects (examples of past or typical work)

  • Building Garcon, a tool that allows researchers to easily access LLM internals from a Jupyter notebook.
  • Setting up and optimizing a pipeline to efficiently collect petabytes of transformer activations and shuffle them.
  • Profiling and optimizing ML training, including parallelizing to many GPUs.
  • Making launching ML experiments and manipulating+analyzing the results fast and easy.
  • Creating an interactive visualization of attention between tokens in a language model.

Location & Office Policy

  • This role is based in the San Francisco office; Anthropic is open to considering exceptional candidates for remote work on a case-by-case basis.
  • The company expects staff to be in one of its offices at least 25% of the time (location-based hybrid policy), though some roles may require more time in offices.

Compensation

  • Expected base annual salary: $315,000 - $560,000 USD.
  • Total compensation package for full-time employees includes equity, benefits, and may include incentive compensation.

Logistics & Other Information

  • Visa sponsorship: Anthropic does sponsor visas but cannot guarantee sponsorship for every role/candidate; they retain an immigration lawyer to help when an offer is made.
  • Encouragement to apply even if you do not meet every qualification; strong emphasis on diversity and inclusion.
  • Candidate guidance on using AI in the application process is provided via a linked policy.

How to Apply / Additional Application Details

  • The posting includes an application form with fields for resume/CV, GitHub, publications, written prompts about fit and past work, earliest start date, visa questions, and other standard application items.

(Original posting contained multiple links to team resources, publications, and further reading.)