Research Engineer / Scientist, Robustness

at Anthropic

📍 San Francisco, United States

USD 315,000-560,000 per year

MIDDLE

✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Kubernetes @ 3 Machine Learning @ 3 Communication @ 3 NLP @ 3 LLM @ 3

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems that are safe and beneficial. The Robustness Team, part of the Alignment Science team, conducts safety research and engineering to ensure AI systems can be deployed safely. Projects span jailbreak robustness, automated red-teaming, monitoring techniques, applied threat modeling, and other work to enable safe deployment of more advanced AI systems.

About the role

You will take a pragmatic approach to running machine learning experiments to help understand and steer the behavior of powerful AI systems. You will operate as both scientist and engineer, working on risks from powerful future systems (ASL-3/ASL-4) as well as risks occurring today, collaborating with teams such as Interpretability, Fine-Tuning, and the Frontier Red Team.

Representative projects include:

Testing robustness of safety techniques by training models to subvert interventions.
Running multi-agent reinforcement learning experiments to test techniques like AI Debate.
Building tooling to efficiently evaluate effectiveness of novel LLM-generated jailbreaks.
Writing scripts and prompts to produce evaluation questions testing models' reasoning in safety-relevant contexts.
Contributing to research papers, blog posts, and talks.
Running experiments that inform AI safety efforts such as the Responsible Scaling Policy.

Responsibilities

Design and run ML experiments to evaluate and improve robustness of AI safety techniques.
Develop tooling and scripts for efficient evaluation and generation of evaluation data and jailbreaks.
Collaborate with interpretability, fine-tuning, and red-team teams to integrate findings.
Contribute to research outputs (papers, figures, writing) and cross-team safety initiatives.

Requirements

Significant software, ML, or research engineering experience.
Some experience contributing to empirical AI research projects.
Some familiarity with technical AI safety research.
Prefer collaborative, fast-moving projects and the ability to work beyond strict role boundaries.
Care about the impacts of AI.
Minimum: Bachelor's degree in a related field or equivalent experience.

Strong candidates may also have:

Experience authoring research papers in machine learning, NLP, or AI safety.
Experience with large language models (LLMs).
Experience with reinforcement learning (including multi-agent RL).
Experience with Kubernetes clusters and complex shared codebases.

Compensation

Annual Salary: $315,000 - $560,000 USD

Logistics & Office Policy

Location: San Francisco, CA (preference for Bay Area-based candidates; open to candidates who can travel ~25% to the Bay Area).
Location-based hybrid policy: staff are expected to be in one of the offices at least 25% of the time.
Visa sponsorship: Anthropic does sponsor visas and retains immigration legal support, though not all roles/candidates can be successfully sponsored.

How we're different / Culture

Anthropic emphasizes large-scale, high-impact empirical AI research done collaboratively. Communication skills and cross-disciplinary collaboration are highly valued. Benefits mentioned include competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a San Francisco office space.

Encouragement

Anthropic encourages applications from candidates who may not meet every qualification and values diverse perspectives in AI safety work.