Research Manager, Interpretability

at Anthropic

📍 San Francisco, United States

USD 340,000-425,000 per year

MIDDLE

✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Machine Learning @ 3 Hiring @ 3 Leadership @ 3 People Management @ 3 Communication @ 6 Prioritization @ 6 Project Management @ 6

Details

Anthropic’s Interpretability team seeks a manager to support a team of researchers and engineers focused on mechanistic interpretability — reverse engineering how modern large language models work at a deep, mechanistic level. The manager will partner with an individual-contributor research lead to translate research ideas into tangible goals, manage team execution and careers, and drive hiring and cross-team collaboration. The team’s work centers on mechanistic interpretability (treating neural networks like programs to be reverse engineered) and has produced publications and methods in circuits, feature extraction, and attribution graphs.

Responsibilities

Partner with a research lead on direction, project planning and execution, hiring, and people development
Set and maintain a high bar for execution speed and quality; identify process improvements to help the team operate effectively
Coach and support team members to increase impact and develop careers (performance evaluation, mentorship)
Drive the team's recruiting efforts, including hiring planning, process improvements, sourcing, and closing
Identify and support opportunities for collaboration with other teams across Anthropic
Communicate team updates and results to other teams and leadership
Maintain a deep understanding of the team's technical work and its implications for AI safety

Requirements / Qualifications

Experienced manager (minimum 2–5 years) with a track record of leading highly technical research and/or engineering teams
Background in machine learning, AI, or a related technical field
Enjoys people management and has experience with coaching, mentorship, performance evaluation, career development, and hiring for technical roles
Strong project management skills, including prioritization and cross-functional coordination and collaboration
Experience managing technical teams through ambiguity and change
Quick learner, capable of understanding and contributing to discussions on complex technical topics and motivated to learn about the team’s research
Strong verbal and written communication skills
Commitment to AI safety and Anthropic’s mission

Strong candidates may also have

Experience scaling engineering infrastructure
Experience working on open-ended, exploratory research agendas aimed at foundational insights
Some familiarity with mechanistic interpretability and the team’s prior work

Role-specific location policy

This role is expected to be in the San Francisco office for 3 days a week (hybrid). Anthropic currently expects staff to be in an office at least 25% of the time.

Compensation

Annual base salary: $340,000 - $425,000 USD
Total compensation package for full-time employees includes equity, benefits, and may include incentive compensation

Logistics

Education: Minimum of a Bachelor's degree in a related field or equivalent experience
Visa sponsorship: Anthropic does sponsor visas in many cases and retains immigration counsel to assist where possible
Anthropic encourages applications from candidates who may not meet every listed qualification and values diverse perspectives

About the team and how we work

The Interpretability team focuses on mechanistic interpretability (circuit-style work) and has published work on extracting interpretable features from large language models and methods for building circuits and attribution graphs
Anthropic emphasizes collaborative, large-scale research efforts and values communication skills

How to apply

Apply via the Anthropic careers page. The application requests standard contact information, resume/CV or LinkedIn, responses to questions about fit and experience, and asks applicants to acknowledge candidate AI usage guidance.