Research Manager, Interpretability

at Anthropic

📍 San Francisco, United States

USD 340,000-425,000 per year

MIDDLE

✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Algorithms @ 3 Machine Learning @ 3 Hiring @ 3 Leadership @ 3 People Management @ 3 Communication @ 6 Prioritization @ 6 Project Management @ 6

Details

Anthropic’s Interpretability team researches mechanistic interpretability of large language models to discover how neural network parameters map to meaningful algorithms and to build a scientific foundation for making models reliable, interpretable, and steerable. The team focuses on reverse-engineering model computation ("circuits"), resolving superposition, decomposing models into interpretable components, and applying these methods to production models (e.g., Claude series).

Responsibilities

Partner with a research lead on direction, project planning and execution, hiring, and people development
Set and maintain a high bar for execution speed and quality; identify process improvements to help the team operate effectively
Coach and support team members to increase impact and develop their careers
Drive the team's recruiting efforts, including hiring planning, process improvements, sourcing, and closing
Help identify and support opportunities for collaboration with other teams across Anthropic
Communicate team updates and results to other teams and leadership
Maintain a deep understanding of the team's technical work and its implications for AI safety

Requirements

Experienced manager (minimum ~2–5 years) with a track record of effectively leading highly technical research and/or engineering teams
Background in machine learning, AI, or a related technical field
Active enjoyment of people management and experience with coaching, mentorship, performance evaluation, career development, and hiring for technical roles
Strong project management skills, including prioritization and cross-functional coordination
Experience managing technical teams through periods of ambiguity and change
Quick learner, capable of understanding and contributing to discussions on complex technical topics
Strong written and verbal communication skills
Commitment to AI safety and the mission of building steerable, trustworthy AI systems

Strong candidates may also have

Experience scaling engineering infrastructure
Experience working on open-ended, exploratory research agendas aimed at foundational insights
Familiarity with mechanistic interpretability and related research (e.g., circuit-based interpretability, feature discovery)

Role-specific location policy

This role is expected to be in the San Francisco office ~3 days a week (hybrid). The company expects staff to be in one of Anthropic's offices at least ~25% of the time; some roles may require more time in-office.

Logistics & Education

Minimum: Bachelor's degree in a related field or equivalent experience
Visa sponsorship: Anthropic may sponsor visas and will make reasonable efforts if an offer is made

Benefits

Competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and an office in San Francisco

Additional notes

The team emphasizes collaborative, large-scale research efforts and values strong communication and impact-driven work. Familiarity with the team's prior work (circuits, monosemanticity, activation atlases, etc.) is helpful but not strictly required.