Anthropic AI Safety Fellow, UK

at Anthropic

📍 Canada
📍 London, United Kingdom
📍 Berkeley, United States

GBP 67,600 per year

MIDDLE

✅ Remote ✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Python @ 6 Machine Learning @ 3 Mathematics @ 6 API @ 3

Details

Anthropic’s Fellows Program is an external collaboration program focused on accelerating progress in AI safety research by providing promising talent with an opportunity to gain empirical research experience. The program runs for about 2 months with the possibility of extension for another 4 months (possible total of ~6 months). Fellows work on an empirical project aligned with Anthropic’s research priorities, using external infrastructure (e.g., open-source models, public APIs) and aim to produce a public output (for example, a paper submission). Fellows receive mentorship from Anthropic researchers, funding for compute and research expenses, access to shared workspaces, and a weekly stipend.

Responsibilities

Conduct an empirical AI safety research project using external infrastructure (open-source models, public APIs).
Produce a public output from the project (e.g., paper submission) with substantial support and mentorship.
Work closely with assigned mentors and participate in project selection and mentor matching.
Collaborate with the broader AI safety research community and incorporate feedback on research direction.

What to Expect / Benefits

Direct mentorship from Anthropic researchers and connection to the AI safety research community.
Weekly stipend of £1,300 and access to benefits (benefits vary by country and may include medical, dental, and vision insurance).
Funding for compute and other research expenses.
Shared workspaces in Berkeley, California and London, UK.
Employed via a third-party talent partner; may be eligible for benefits through the employer of record.

Mentors & Research Areas

Potential mentors include (representative list): Ethan Perez; Jan Leike; Emmanuel Ameisen; Jascha Sohl-Dickstein; Sara Price; Samuel Marks; Joe Benton; Akbir Khan; Fabien Roger; Alex Tamkin; Nina Panickssery; Collin Burns; Jack Lindsey; Trenton Bricken; Evan Hubinger.

Representative research areas include:

Scalable oversight
Adversarial robustness and AI control
Model organisms of misalignment
Model internals / mechanistic interpretability
AI welfare

Requirements

Motivation to reduce catastrophic risks from advanced AI systems.
Strong technical background in computer science, mathematics, physics, or related fields (Bachelor's degree or equivalent experience required).
Strong programming skills, particularly in Python.
Experience with machine learning frameworks is expected (comfort with ML tooling is required).
Ability to work full-time on the fellowship for at least 2 months and ideally 6 months.
Have or can obtain US, UK, or Canadian work authorization, and be able to work full-time out of Berkeley or London (or remotely if in Canada).
Comfortable programming in Python, thrive in fast-paced collaborative environments, and able to execute projects independently while incorporating feedback.

Notes on authorization and visas in the posting:

The posting states fellows must have or be able to obtain US, UK, or Canadian work authorization and notes support for Fellows on F-1 visas eligible for full-time OPT/CPT.
The posting also contains a logistics section that includes a more general statement about visa sponsorship efforts; the posting also includes an earlier banner noting certain logistics (for example visa sponsorship) do not apply to this specific job posting. Both statements appear in the original text.

Strong candidates may also have

Experience with empirical ML research projects
Experience working with large language models
Experience in interpretability or other listed research areas
Experience with deep learning frameworks and experiment management
Track record of open-source contributions

Candidates need not have

100% of the listed skills or formal certifications/education credentials

Interview process

Rolling application review until August 17 (applicants should apply by August 17 for consideration). Onboarding for the next cohort is expected in October 2025 (later start dates possible).
Stages:
1. Initial application and references (apply by August 17; provide references).
2. Technical assessment: 90-minute coding screen in Python (refactoring and adapting to new requirements).
3. Technical interview: 55-minute coding-based interview (does not involve ML).
4. Final interviews: Research discussion (15 minutes) and a take-home research-focused project (5 hours work period + 30-minute review). Reference calls are conducted in parallel.
Offers aim to be extended by early October; offer decisions and extension decisions follow the cohort schedule in the posting.

Compensation (GBP)

Weekly stipend: £1,300/week (expected 40 hours/week).
Annual salary range shown in the posting: £67,600 - £67,600 GBP.

Role-Specific Location Policy

This role is exempt from the general expectation that staff be in office at least 25% of the time and can be done remotely from anywhere in the UK.
Anthropic strongly prefers candidates who can be based in London and use the shared workspace for Fellows.
Shared workspaces are available in Berkeley, CA and London, UK.

Logistics & Additional Notes

Education requirement: at least a Bachelor's degree in a related field or equivalent experience.
The posting encourages applications from underrepresented groups and notes that not all strong candidates will meet every qualification.
Anthropic is a public benefit corporation headquartered in San Francisco and describes broader company benefits and policies elsewhere in the posting.

(Original posting contains more details and mentor/project links; this description preserves the key responsibilities, requirements, compensation, locations, and interview process.)