Research Engineer / Scientist, Alignment Science, London

at Anthropic

📍 London, United Kingdom
📍 San Francisco, United States

GBP 250,000-270,000 per year

MIDDLE

✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Kubernetes @ 3 Python @ 3 Machine Learning @ 3 NLP @ 3 LLM @ 3

Details

You will build and run machine learning experiments to help understand and steer the behavior of powerful AI systems. The role sits on the Alignment Science team and involves empirical experimental research on AI safety, often collaborating with Interpretability, Fine-Tuning, and Frontier Red Team. Interviews for this role are conducted in Python.

Responsibilities

Design, implement, and run elegant and thorough ML experiments to evaluate model behavior and safety.
Test robustness of safety techniques by training models to subvert interventions and evaluate how effective interventions are.
Run multi-agent reinforcement learning experiments (e.g., techniques like AI Debate) and other RL experiments relevant to alignment.
Build tooling to efficiently evaluate the effectiveness of novel LLM-generated jailbreaks and other adversarial behaviors.
Write scripts and prompts to efficiently produce evaluation questions to test models’ reasoning abilities in safety-relevant contexts.
Contribute to research outputs: ideas, figures, writing for research papers, blog posts, and talks.
Run experiments that inform key AI safety efforts such as the Responsible Scaling Policy.

Requirements

Significant software, machine learning, or research engineering experience.
Some experience contributing to empirical AI research projects.
Familiarity with technical AI safety research.
Comfortable working on fast-moving collaborative projects and picking up work outside a narrow job description.
Ability to communicate and collaborate effectively across research and engineering teams.

Strong candidates may also have:

Experience authoring research papers in ML, NLP, or AI safety.
Experience with large language models (LLMs).
Experience with reinforcement learning and multi-agent RL experiments.
Experience with Kubernetes clusters and working within complex, shared codebases.

Candidates need not have 100% of the listed skills; formal certifications are not required.

Representative projects

Testing safety technique robustness by training models to subvert those techniques.
Running multi-agent RL experiments to explore approaches like AI Debate.
Building evaluation tooling for LLM-generated jailbreaks.
Creating prompts and scripts for targeted evaluations of reasoning and safety behaviors.
Contributing to research outputs that inform Anthropic’s safety policies and publications.

Logistics

Location: London, UK (candidates must be based at least 25% in London and travel to San Francisco occasionally).
Education: at least a Bachelor's degree in a related field or equivalent experience is required.
Visa: Anthropic does sponsor visas and will make reasonable efforts if an offer is made.
Interviews: conducted in Python.

Compensation and Benefits

Annual base salary: £250,000 - £270,000 GBP.
Total compensation for full-time employees may include equity, benefits, and incentive compensation.
Additional benefits mentioned: competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and office space for collaboration.

How to apply

Applicants are encouraged to apply even if they do not meet every qualification.
Application materials requested include Resume/CV or LinkedIn profile; other fields and prompts are provided during application.