Anthropic AI Safety Fellow

📍 Canada
📍 United Kingdom
📍 United States
📍 London, United Kingdom
📍 Berkeley, United States
📍 San Francisco, United States

USD 200,200 per year

MIDDLE

✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Python @ 5 Mathematics @ 6 Mentoring @ 3 API @ 3

Details

Anthropic’s Fellows Program accelerates AI safety research by funding and mentoring promising technical talent for a four-month empirical research project. Fellows primarily use external infrastructure (open-source models, public APIs) to conduct an empirical project aligned with Anthropic’s research priorities and aim to produce a public output (e.g., a paper submission). This application is for cohorts starting in May and July 2026. Please apply by January 12, 2026.

Responsibilities

Conduct an empirical AI safety research project over four months using external infrastructure (open-source models, public APIs).
Work toward producing a public research output (paper, blog post, dataset, or open-source artifact).
Participate in mentor matching and project selection processes.
Collaborate with Anthropic researchers and the broader AI safety research community.
Attend weekly research discussions and check-ins with mentors.

Requirements

Must be fluent in Python programming.
Available to work full-time (40 hours per week) on the Fellows program for four months.
Have work authorization and be located in the US, UK, or Canada for the duration of the program (Anthropic is not able to sponsor visas for fellows).
Strong technical background in computer science, mathematics, physics, cybersecurity, or related fields (Bachelor’s degree or equivalent experience required for logistics).
Thrive in fast-paced, collaborative environments and be able to implement ideas quickly and communicate clearly.

Strong candidates may also have:

Experience with empirical ML research projects.
Experience working with large language models (LLMs).
Experience in AI safety research areas (scalable oversight, adversarial robustness, model organisms, mechanistic interpretability, AI welfare).
Experience with deep learning frameworks and experiment management.
Track record of open-source contributions.

Logistics & eligibility:

Fellows must have work authorization in the US, UK, or Canada and be located in that country during the program.
We are open to remote fellows in the UK, US, or Canada; shared workspaces are available in London, UK and Berkeley, California, and mentors will visit these spaces.
Visa sponsorship is not available for fellows.

Interview process:

Initial application & references check, technical assessments & interviews, and a research discussion.

Compensation

Weekly stipend: 3,850 USD / 2,310 GBP / 4,300 CAN.
Funding for compute (~$15k/month) and other research expenses.
Expected commitment: 40 hours per week for 4 months (with possible extension).

Mentorship & Research Areas

Direct mentorship from Anthropic researchers and access to a broader AI safety research community.
Example mentors include Jan Leike, Sam Bowman, Nicholas Carlini, Jascha Sohl-Dickstein, and others.
Research areas include scalable oversight, adversarial robustness and AI control, model organisms, mechanistic interpretability, and AI welfare.

Benefits

Access to a shared workspace (London or Berkeley) and remote options in the UK, US, or Canada.
Compute funding and research expense support.
Connection to Anthropic researchers and potential pathways to full-time research roles (no guarantee of full-time offers).

How to Apply

Apply through the Constellation application portal (link provided in the original posting).

Additional Notes

Anthropic encourages applicants from diverse backgrounds and explicitly invites candidates to apply even if they do not meet every listed qualification.
Applications are due January 12, 2026.