Research Engineer / Scientist, Alignment Science

at Anthropic

📍 San Francisco, United States

USD 315,000-340,000 per year

MIDDLE

✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Kubernetes @ 3 Machine Learning @ 3 Communication @ 3 NLP @ 3

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems that are safe and beneficial for users and society. The team combines researchers, engineers, policy experts, and business leaders to build beneficial AI systems.

Responsibilities

Build and run elegant and thorough machine learning experiments to understand and steer the behavior of powerful AI systems.
Conduct exploratory experimental research on AI safety focusing on risks from powerful future systems.
Collaborate with teams including Interpretability, Fine-Tuning, and Frontier Red Team.
Work on scalable oversight techniques to keep highly capable models helpful and honest.
Develop AI control methods to ensure safety in unfamiliar or adversarial scenarios.
Create model organisms of misalignment to empirically understand alignment failures.
Build and align automated systems to speed up alignment research.
Conduct alignment assessments for pre-deployment safety.
Develop robust defenses and evaluation frameworks for model safety and mitigate risks before deployment.
Investigate model welfare, moral status, and related ethical questions.
Test robustness of safety techniques by training language models to subvert these techniques.
Run multi-agent reinforcement learning experiments (e.g., AI Debate).
Build tools to evaluate model jailbreaks.
Write scripts and prompts to produce evaluation questions on model reasoning in safety contexts.
Contribute to research papers, blog posts, and talks related to AI safety research.

Requirements

Significant software, machine learning, or research engineering experience.
Experience contributing to empirical AI research projects.
Familiarity with technical AI safety research.
Preference for fast-moving collaborative projects.
Willingness to take on tasks beyond job description.
Interest in AI impacts.

Strong Candidates May Also Have

Experience authoring research papers in machine learning, NLP, or AI safety.
Experience with large language models (LLMs).
Experience with reinforcement learning.
Experience with Kubernetes clusters and complex shared codebases.

Candidates Need Not Have

All required skills or formal certifications.

Benefits and Logistics

Education: Bachelor's degree or equivalent experience required.
Hybrid work policy: Expect to be onsite at least 25% of the time.
Visa sponsorship available with reasonable effort and immigration lawyer support.
Competitive compensation and benefits, equity donation matching, generous vacation and parental leave, flexible working hours, and a collaborative office environment.

Company Insight

Anthropic sees AI research as big science requiring collaborative, high-impact projects.
Emphasis on communication and empirical science approach akin to physics and biology.
Research directions include scalable oversight, AI control, interpretability, scaling laws, AI safety, and learning from human preferences.

Salary

Annual salary range: $315,000 - $340,000 USD.