Anthropic AI Safety Fellow
at Anthropic
📍 Canada
📍 United Kingdom
📍 United States
📍 London, United Kingdom
📍 Berkeley, United States
📍 San Francisco, United States
📍 United Kingdom
📍 United States
📍 London, United Kingdom
📍 Berkeley, United States
📍 San Francisco, United States
USD 200,200 per year
SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
Python @ 5 Mathematics @ 6 Mentoring @ 3 API @ 3Details
Anthropic’s Fellows Program accelerates AI safety research by funding and mentoring promising technical talent for a four-month empirical research project. Fellows primarily use external infrastructure (open-source models, public APIs) to conduct an empirical project aligned with Anthropic’s research priorities and aim to produce a public output (e.g., a paper submission). This application is for cohorts starting in May and July 2026. Please apply by January 12, 2026.
Responsibilities
- Conduct an empirical AI safety research project over four months using external infrastructure (open-source models, public APIs).
- Work toward producing a public research output (paper, blog post, dataset, or open-source artifact).
- Participate in mentor matching and project selection processes.
- Collaborate with Anthropic researchers and the broader AI safety research community.
- Attend weekly research discussions and check-ins with mentors.
Requirements
- Must be fluent in Python programming.
- Available to work full-time (40 hours per week) on the Fellows program for four months.
- Have work authorization and be located in the US, UK, or Canada for the duration of the program (Anthropic is not able to sponsor visas for fellows).
- Strong technical background in computer science, mathematics, physics, cybersecurity, or related fields (Bachelor’s degree or equivalent experience required for logistics).
- Thrive in fast-paced, collaborative environments and be able to implement ideas quickly and communicate clearly.
Strong candidates may also have:
- Experience with empirical ML research projects.
- Experience working with large language models (LLMs).
- Experience in AI safety research areas (scalable oversight, adversarial robustness, model organisms, mechanistic interpretability, AI welfare).
- Experience with deep learning frameworks and experiment management.
- Track record of open-source contributions.
Logistics & eligibility:
- Fellows must have work authorization in the US, UK, or Canada and be located in that country during the program.
- We are open to remote fellows in the UK, US, or Canada; shared workspaces are available in London, UK and Berkeley, California, and mentors will visit these spaces.
- Visa sponsorship is not available for fellows.
Interview process:
- Initial application & references check, technical assessments & interviews, and a research discussion.
Compensation
- Weekly stipend: 3,850 USD / 2,310 GBP / 4,300 CAN.
- Funding for compute (~$15k/month) and other research expenses.
- Expected commitment: 40 hours per week for 4 months (with possible extension).
Mentorship & Research Areas
- Direct mentorship from Anthropic researchers and access to a broader AI safety research community.
- Example mentors include Jan Leike, Sam Bowman, Nicholas Carlini, Jascha Sohl-Dickstein, and others.
- Research areas include scalable oversight, adversarial robustness and AI control, model organisms, mechanistic interpretability, and AI welfare.
Benefits
- Access to a shared workspace (London or Berkeley) and remote options in the UK, US, or Canada.
- Compute funding and research expense support.
- Connection to Anthropic researchers and potential pathways to full-time research roles (no guarantee of full-time offers).
How to Apply
- Apply through the Constellation application portal (link provided in the original posting).
Additional Notes
- Anthropic encourages applicants from diverse backgrounds and explicitly invites candidates to apply even if they do not meet every listed qualification.
- Applications are due January 12, 2026.