Research Engineer / Scientist, Alignment Science

at Anthropic

📍 London, United Kingdom

GBP 250,000-270,000 per year

MIDDLE

✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Kubernetes @ 3 Machine Learning @ 3 Communication @ 3 NLP @ 3 LLM @ 3

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems that are safe and beneficial for users and society. The team is a fast-growing group of researchers, engineers, policy experts, and leaders building beneficial AI systems.

About the Role

Build and run elegant and thorough machine learning experiments to understand and steer the behavior of powerful AI systems. Focus on making AI helpful, honest, and harmless, especially addressing challenges related to human-level capabilities. Work as both a scientist and engineer in exploratory experimental research on AI safety, concentrating on risks from future powerful systems (ASL-3 or ASL-4) often collaborating with Interpretability, Fine-Tuning, and Frontier Red Team.

Research Areas

AI Control: Methods to keep advanced AI safe and harmless in unfamiliar/adversarial scenarios.
Alignment Stress-testing: Develop model organisms of misalignment to understand alignment failures empirically.

Representative Projects

Test robustness of safety techniques by training language models to subvert them.
Run multi-agent reinforcement learning experiments (e.g., AI Debate).
Build tools to evaluate effectiveness of LLM-generated jailbreaks.
Write scripts and prompts for evaluation questions assessing model reasoning in safety contexts.
Contribute ideas, visuals, and writing to research outputs.
Run experiments supporting key AI safety efforts like the Responsible Scaling Policy.

Candidate Profile

Required

Significant software, ML, or research engineering experience.
Experience contributing to empirical AI research.
Familiarity with technical AI safety research.
Enjoy fast-moving collaborative projects.
Willingness to take on tasks beyond strict job description.
Care about AI impacts.

Strong Candidates May Also Have

Authored research papers in ML, NLP, or AI safety.
Experience with LLMs.
Experience with reinforcement learning.
Experience managing Kubernetes clusters and complex shared codebases.

Not Required

100% of listed skills.
Formal certifications or degrees.

Salary

Annual salary range: £250,000 - £270,000 GBP

Logistics

Requires at least a Bachelor's degree or equivalent experience.
Hybrid location policy, expected to be in office at least 25% of the time.
Based in London with occasional travel to San Francisco.
Visa sponsorship available with reasonable effort.

How We're Different

Focus on large-scale impactful AI research.
Emphasize empirical science with collaboration and communication.
Research informed by past work including GPT-3, Interpretability, Scaling Laws, and AI Safety.

Benefits

Competitive compensation and benefits.
Optional equity donation matching.
Generous vacation and parental leave.
Flexible working hours.
Collaborative and lovely office space.

Additional

Guidance provided on candidates’ AI usage during application process.