Research Manager, Interpretability

USD 340,000-425,000 per year
MIDDLE
✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Algorithms @ 3 Machine Learning @ 3 Hiring @ 3 Leadership @ 3 People Management @ 3 Communication @ 6 Prioritization @ 6 Project Management @ 6

Details

Anthropic’s Interpretability team researches mechanistic interpretability of large language models to discover how neural network parameters map to meaningful algorithms and to build a scientific foundation for making models reliable, interpretable, and steerable. The team focuses on reverse-engineering model computation ("circuits"), resolving superposition, decomposing models into interpretable components, and applying these methods to production models (e.g., Claude series).

Responsibilities

  • Partner with a research lead on direction, project planning and execution, hiring, and people development
  • Set and maintain a high bar for execution speed and quality; identify process improvements to help the team operate effectively
  • Coach and support team members to increase impact and develop their careers
  • Drive the team's recruiting efforts, including hiring planning, process improvements, sourcing, and closing
  • Help identify and support opportunities for collaboration with other teams across Anthropic
  • Communicate team updates and results to other teams and leadership
  • Maintain a deep understanding of the team's technical work and its implications for AI safety

Requirements

  • Experienced manager (minimum ~2–5 years) with a track record of effectively leading highly technical research and/or engineering teams
  • Background in machine learning, AI, or a related technical field
  • Active enjoyment of people management and experience with coaching, mentorship, performance evaluation, career development, and hiring for technical roles
  • Strong project management skills, including prioritization and cross-functional coordination
  • Experience managing technical teams through periods of ambiguity and change
  • Quick learner, capable of understanding and contributing to discussions on complex technical topics
  • Strong written and verbal communication skills
  • Commitment to AI safety and the mission of building steerable, trustworthy AI systems

Strong candidates may also have

  • Experience scaling engineering infrastructure
  • Experience working on open-ended, exploratory research agendas aimed at foundational insights
  • Familiarity with mechanistic interpretability and related research (e.g., circuit-based interpretability, feature discovery)

Role-specific location policy

  • This role is expected to be in the San Francisco office ~3 days a week (hybrid). The company expects staff to be in one of Anthropic's offices at least ~25% of the time; some roles may require more time in-office.

Logistics & Education

  • Minimum: Bachelor's degree in a related field or equivalent experience
  • Visa sponsorship: Anthropic may sponsor visas and will make reasonable efforts if an offer is made

Benefits

  • Competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and an office in San Francisco

Additional notes

  • The team emphasizes collaborative, large-scale research efforts and values strong communication and impact-driven work. Familiarity with the team's prior work (circuits, monosemanticity, activation atlases, etc.) is helpful but not strictly required.