Anthropic AI Safety Fellow

USD 200,200 per year
MIDDLE
✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Python @ 5 Mathematics @ 6 Mentoring @ 3 API @ 3

Details

Anthropic’s Fellows Program accelerates AI safety research by funding and mentoring promising technical talent for a four-month empirical research project. Fellows primarily use external infrastructure (open-source models, public APIs) to conduct an empirical project aligned with Anthropic’s research priorities and aim to produce a public output (e.g., a paper submission). This application is for cohorts starting in May and July 2026. Please apply by January 12, 2026.

Responsibilities

  • Conduct an empirical AI safety research project over four months using external infrastructure (open-source models, public APIs).
  • Work toward producing a public research output (paper, blog post, dataset, or open-source artifact).
  • Participate in mentor matching and project selection processes.
  • Collaborate with Anthropic researchers and the broader AI safety research community.
  • Attend weekly research discussions and check-ins with mentors.

Requirements

  • Must be fluent in Python programming.
  • Available to work full-time (40 hours per week) on the Fellows program for four months.
  • Have work authorization and be located in the US, UK, or Canada for the duration of the program (Anthropic is not able to sponsor visas for fellows).
  • Strong technical background in computer science, mathematics, physics, cybersecurity, or related fields (Bachelor’s degree or equivalent experience required for logistics).
  • Thrive in fast-paced, collaborative environments and be able to implement ideas quickly and communicate clearly.

Strong candidates may also have:

  • Experience with empirical ML research projects.
  • Experience working with large language models (LLMs).
  • Experience in AI safety research areas (scalable oversight, adversarial robustness, model organisms, mechanistic interpretability, AI welfare).
  • Experience with deep learning frameworks and experiment management.
  • Track record of open-source contributions.

Logistics & eligibility:

  • Fellows must have work authorization in the US, UK, or Canada and be located in that country during the program.
  • We are open to remote fellows in the UK, US, or Canada; shared workspaces are available in London, UK and Berkeley, California, and mentors will visit these spaces.
  • Visa sponsorship is not available for fellows.

Interview process:

  • Initial application & references check, technical assessments & interviews, and a research discussion.

Compensation

  • Weekly stipend: 3,850 USD / 2,310 GBP / 4,300 CAN.
  • Funding for compute (~$15k/month) and other research expenses.
  • Expected commitment: 40 hours per week for 4 months (with possible extension).

Mentorship & Research Areas

  • Direct mentorship from Anthropic researchers and access to a broader AI safety research community.
  • Example mentors include Jan Leike, Sam Bowman, Nicholas Carlini, Jascha Sohl-Dickstein, and others.
  • Research areas include scalable oversight, adversarial robustness and AI control, model organisms, mechanistic interpretability, and AI welfare.

Benefits

  • Access to a shared workspace (London or Berkeley) and remote options in the UK, US, or Canada.
  • Compute funding and research expense support.
  • Connection to Anthropic researchers and potential pathways to full-time research roles (no guarantee of full-time offers).

How to Apply

  • Apply through the Constellation application portal (link provided in the original posting).

Additional Notes

  • Anthropic encourages applicants from diverse backgrounds and explicitly invites candidates to apply even if they do not meet every listed qualification.
  • Applications are due January 12, 2026.