Research Engineer / Scientist, Alignment Science, London

at Anthropic

📍 London, United Kingdom

GBP 250,000-270,000 per year

MIDDLE

✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Kubernetes @ 3 Machine Learning @ 3 Communication @ 3 NLP @ 3 LLM @ 3

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.

Role overview

You will build and run thorough machine learning experiments to help understand and steer the behavior of powerful AI systems. You should care about making AI helpful, honest, and harmless, and be interested in challenges that arise as systems approach human-level capabilities. This role sits at the intersection of science and engineering: as a Research Engineer on Alignment Science you'll contribute to exploratory experimental research on AI safety (with a focus on risks from powerful future systems), often collaborating with Interpretability, Fine-Tuning, and Frontier Red Team colleagues.

Representative projects

Test the robustness of safety techniques by training language models to subvert interventions and measuring effectiveness.
Run multi-agent reinforcement learning experiments to evaluate techniques such as AI Debate.
Build tooling to efficiently evaluate the effectiveness of novel LLM-generated jailbreaks.
Write scripts and prompts to produce evaluation questions testing models’ reasoning abilities in safety-relevant contexts.
Contribute ideas, figures, and writing to research papers, blog posts, and talks.
Run experiments that inform AI safety efforts such as the Responsible Scaling Policy.

Responsibilities

Design, implement, and run machine learning experiments related to alignment and safety.
Build tooling and evaluation pipelines for testing model behavior and jailbreaks.
Collaborate with cross-functional teams (e.g., Interpretability, Fine-Tuning, Red Team) on experimental design and analysis.
Document results and contribute to papers, blog posts, and internal research communication.

Requirements

Significant software, machine learning, or research engineering experience.
Experience contributing to empirical AI research projects.
Familiarity with technical AI safety research.
Comfortable in fast-moving collaborative projects and willing to work outside strict job boundaries when needed.
Care about the impacts of AI.
We require at least a Bachelor’s degree in a related field or equivalent experience.

Strong candidates may also have

Experience authoring research papers in machine learning, NLP, or AI safety.
Experience with large language models (LLMs).
Experience with reinforcement learning and multi-agent experiments.
Experience with Kubernetes clusters and complex shared codebases.

Candidates need not have

100% of the listed skills.
Formal certifications or education credentials (equivalent experience is acceptable).

Logistics

Location: London, UK (candidates must be based at least 25% in London and travel to San Francisco occasionally).
Location-based hybrid policy: staff are expected to be in one of our offices at least 25% of the time.
Visa sponsorship: Anthropic does sponsor visas where possible and retains immigration legal support.

Benefits and culture

Anthropic offers competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and collaborative office spaces. The company emphasizes big-science, high-impact empirical AI research and values strong communication and collaboration skills.

Compensation

Annual Salary: £250,000 - £270,000 GBP

How to apply

We encourage applicants even if they do not meet every listed qualification. Anthropic values diversity and inclusion and provides guidance on candidate AI usage in the application process.