Anthropic AI Safety Fellow, US

at Anthropic

📍 San Francisco, United States
📍 Canada
📍 London, United Kingdom
📍 Berkeley, United States
📍 United States

USD 109,200 per year

MIDDLE

✅ Remote ✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Python @ 6 Machine Learning @ 2 Mathematics @ 6 API @ 3

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The Anthropic Fellows Program is an external collaboration program focused on accelerating progress in AI safety research by providing promising talent with an opportunity to gain research experience. The program will run for about 2 months, with the possibility of extension for another 4 months, based on how well the collaboration is going. Fellows use external infrastructure (e.g. open-source models, public APIs) to work on empirical projects aligned with Anthropic's research priorities and are expected to produce public outputs (for example, paper submissions). Fellows receive mentorship, funding, compute resources, access to shared workspaces, and a weekly stipend.

Responsibilities

Work on an empirical AI safety research project using external infrastructure (open-source models, public APIs).
Produce public outputs (e.g., paper submissions) aligned with research priorities.
Receive mentorship from Anthropic researchers and participate in mentor matching and project selection.
Participate in program activities (collaboration, research discussions, mentor meetings) during the fellowship period.

What to Expect / Benefits

Direct mentorship from Anthropic researchers and connection to the broader AI safety research community.
Weekly stipend of $2,100 USD and access to benefits (including access to medical, dental, and vision insurance, a Health Savings Account, an Employee Assistance Program, and a 401(k) retirement plan) through the third-party employer of record where applicable.
Funding for compute and other research expenses.
Shared workspaces in Berkeley, California and London, UK.
Opportunity to transition into full-time empirical AI safety research; strong performers may be considered for future full-time roles.

Mentors & Research Areas

Potential mentors include Ethan Perez, Jan Leike, Emmanuel Ameisen, Jascha Sohl-Dickstein, Sara Price, Samuel Marks, Joe Benton, Akbir Khan, Fabien Roger, Alex Tamkin, Nina Panickssery, Collin Burns, Jack Lindsey, Trenton Bricken, Evan Hubinger.
Representative research areas: Scalable Oversight; Adversarial Robustness and AI Control; Model Organisms; Model Internals / Mechanistic Interpretability; AI Welfare.

Requirements

Motivation to reduce catastrophic risks from advanced AI systems.
Strong technical background in computer science, mathematics, physics, or related fields (Bachelor's degree or equivalent experience required).
Strong programming skills, particularly in Python.
Familiarity with machine learning frameworks.
Ability to work full-time on the fellowship for at least 2 months (ideally 6 months); program expectation is 40 hours/week.
Have or can obtain work authorization in the US, UK, or Canada; able to work full-time out of Berkeley or London (or remotely if in Canada). Anthropic cannot generally sponsor visas for Fellows, but can support F-1 visa holders eligible for full-time OPT/CPT.
Ability to execute projects independently while incorporating feedback; thrive in fast-paced, collaborative environments.

Strong Candidates May Also Have

Experience with empirical ML research projects and working with large language models.
Experience in one of the highlighted research areas (e.g., interpretability).
Experience with deep learning frameworks and experiment management.
Track record of open-source contributions.

Candidates Need Not Have

100% of the listed skills or formal certifications/education credentials.

Interview Process

Rolling application review until Aug 17; technical assessment, technical interview (coding), research discussion, and a take-home research project are part of the process. A 90-minute Python coding screen and a 55-minute coding-based technical interview (not ML-focused) are included. Final stage includes a research discussion and a take-home project (5 hours work period + 30 minute review). Offers aim to be extended by early October.

Compensation

Weekly stipend: $2,100 USD (expected 40 hours/week).
Annual salary listed: $109,200 USD (range shown $109,200 - $109,200 USD).

Role-Specific Location Policy

This role can be done remotely from anywhere in the United States (exempt from the usual 25% in-office expectation).
Shared workspaces available in Berkeley, CA and London, UK; preference for candidates who can be based in the Bay Area.
Work authorization: US, UK, or Canadian authorization required as described above.

Logistics & Other

Education requirement: at least a Bachelor's degree in a related field or equivalent experience.
Emphasis on collaborative, high-impact, large-scale AI research.
Guidance on candidates' AI usage in the application process is provided on Anthropic's site.