Research Engineer / Scientist, Safeguards

at Anthropic

📍 San Francisco, United States

USD 315,000-560,000 per year

MIDDLE

✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Kubernetes @ 3 Machine Learning @ 3 Communication @ 3 NLP @ 3 LLM @ 3

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems safe and beneficial for users and society. The Safeguards Research Team conducts critical safety research and engineering to ensure AI systems can be deployed safely. The role focuses on risks from advanced AI systems (ASL-3 and beyond), with projects including jailbreak robustness, automated red-teaming, monitoring techniques, and applied threat modeling.

Responsibilities

Test robustness of safety techniques by training language models to subvert them.
Run multi-agent reinforcement learning experiments such as AI Debate.
Build tooling to evaluate LLM-generated jailbreak effectiveness.
Write scripts and prompts for evaluating models' reasoning in safety contexts.
Contribute to research papers, blog posts, and talks.
Run experiments informing AI safety efforts and policies.

Requirements

Significant software, machine learning, or research engineering experience.
Some experience contributing to empirical AI research projects.
Familiarity with technical AI safety research.
Collaborative mindset and willingness to address tasks beyond strict job description.
Care about AI impacts.

Strong candidates may also have

Experience authoring ML, NLP, or AI safety research papers.
Experience with large language models (LLMs).
Experience with reinforcement learning.
Experience with Kubernetes clusters and complex shared codebases.

Benefits

Competitive compensation and benefits.
Optional equity donation matching.
Generous vacation and parental leave.
Flexible working hours.
Office space in San Francisco.
Visa sponsorship available with immigration support.
Hybrid work policy requiring presence in office at least 25% of time.

Additional Details

Requires at least a Bachelor's degree or equivalent experience.
Preference for candidates able to be based in the Bay Area or travel there ~25%.

About Anthropic

Anthropic values big science and high impact AI research, emphasizing collaborative and empirical science with frequent research discussions. The team pursues trustworthy and steerable AI while valuing communication skills and diverse perspectives.