Research Manager, Interpretability

USD 340,000-425,000 per year
MIDDLE
✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Machine Learning @ 3 Hiring @ 3 Leadership @ 3 People Management @ 3 Communication @ 6 Prioritization @ 6 Project Management @ 6

Details

Anthropic’s Interpretability team seeks a manager to support a team of researchers and engineers focused on mechanistic interpretability — reverse engineering how modern large language models work at a deep, mechanistic level. The manager will partner with an individual-contributor research lead to translate research ideas into tangible goals, manage team execution and careers, and drive hiring and cross-team collaboration. The team’s work centers on mechanistic interpretability (treating neural networks like programs to be reverse engineered) and has produced publications and methods in circuits, feature extraction, and attribution graphs.

Responsibilities

  • Partner with a research lead on direction, project planning and execution, hiring, and people development
  • Set and maintain a high bar for execution speed and quality; identify process improvements to help the team operate effectively
  • Coach and support team members to increase impact and develop careers (performance evaluation, mentorship)
  • Drive the team's recruiting efforts, including hiring planning, process improvements, sourcing, and closing
  • Identify and support opportunities for collaboration with other teams across Anthropic
  • Communicate team updates and results to other teams and leadership
  • Maintain a deep understanding of the team's technical work and its implications for AI safety

Requirements / Qualifications

  • Experienced manager (minimum 2–5 years) with a track record of leading highly technical research and/or engineering teams
  • Background in machine learning, AI, or a related technical field
  • Enjoys people management and has experience with coaching, mentorship, performance evaluation, career development, and hiring for technical roles
  • Strong project management skills, including prioritization and cross-functional coordination and collaboration
  • Experience managing technical teams through ambiguity and change
  • Quick learner, capable of understanding and contributing to discussions on complex technical topics and motivated to learn about the team’s research
  • Strong verbal and written communication skills
  • Commitment to AI safety and Anthropic’s mission

Strong candidates may also have

  • Experience scaling engineering infrastructure
  • Experience working on open-ended, exploratory research agendas aimed at foundational insights
  • Some familiarity with mechanistic interpretability and the team’s prior work

Role-specific location policy

  • This role is expected to be in the San Francisco office for 3 days a week (hybrid). Anthropic currently expects staff to be in an office at least 25% of the time.

Compensation

  • Annual base salary: $340,000 - $425,000 USD
  • Total compensation package for full-time employees includes equity, benefits, and may include incentive compensation

Logistics

  • Education: Minimum of a Bachelor's degree in a related field or equivalent experience
  • Visa sponsorship: Anthropic does sponsor visas in many cases and retains immigration counsel to assist where possible
  • Anthropic encourages applications from candidates who may not meet every listed qualification and values diverse perspectives

About the team and how we work

  • The Interpretability team focuses on mechanistic interpretability (circuit-style work) and has published work on extracting interpretable features from large language models and methods for building circuits and attribution graphs
  • Anthropic emphasizes collaborative, large-scale research efforts and values communication skills

How to apply

  • Apply via the Anthropic careers page. The application requests standard contact information, resume/CV or LinkedIn, responses to questions about fit and experience, and asks applicants to acknowledge candidate AI usage guidance.