Safeguards Enforcement Analyst, Safety Evaluations

at Anthropic

📍 Washington, United States
📍 New York City, United States
📍 San Francisco, United States

USD 230,000-270,000 per year

MIDDLE

✅ Hybrid

✅ Visa Sponsorship

Used Tools & Technologies

Not specified

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

SQL @ 5 Scoping @ 3 Communication @ 6 Prioritization @ 6 Claude Code @ 3 AI @ 3

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The Safeguards team enforces policies, protects users, and ensures the platform is not misused. This role focuses on Safety Evaluations — running and monitoring evaluations to ensure models meet safety and policy standards before and after launch, coordinating creation of new evals, driving mitigations, and building processes and documentation to scale evaluation work.

Responsibilities

Support model launch readiness by running evaluations, monitoring and interpreting results, and surfacing regressions or unexpected behavior changes to relevant stakeholders
Partner with policy and domain experts throughout the evaluation lifecycle — from identifying risks and scoping evaluation approaches to coordinating creation and ensuring evals remain current with evolving policies, threat vectors, and model capabilities
Work with cross-functional stakeholders to manage evaluation outcomes, interpret results, and drive mitigations where needed
Design processes and eval paradigms to keep evaluations high-signal and insightful as models improve
Build processes and frameworks for creating product-specific evaluations as Anthropic’s product surface expands
Help design and scope tooling improvements to support evolving eval needs and expand self-serve eval creation for non-technical users
Write and maintain rigorous documentation for evaluation creation, execution, and interpretation as eval tooling and processes are built out

Requirements / Qualifications

Experience in trust and safety, content operations, policy enforcement, or a related operational role at a technology company
Comfortable working in ambiguous, fast-moving environments and figuring out paths forward with incomplete information
Experience building processes, workflows, or programs from scratch (zero-to-one work)
Strong program management instincts: creating structure around complex, multi-stakeholder efforts, tracking timelines, dependencies, and deliverables
Eagerness to expand technical toolkit and adopt internal tools and AI-assisted workflows (e.g., Claude Code)
Ability to manage multiple concurrent workstreams across different domain areas with strong prioritization and context-switching
Strong written and cross-functional communication skills
We require at least a Bachelor’s degree in a related field or equivalent experience

Strong candidates may also have

Experience operating under tight, high-stakes timelines (product launches, incident response, regulatory deadlines)
Experience coordinating across engineering, policy, and product teams to translate findings into concrete action
Experience building and maintaining SOPs, runbooks, and operational documentation in fast-changing environments
Proficiency with data tools (SQL, dashboards, spreadsheets) sufficient to maintain and improve workflows
Comfort working with sensitive content areas as part of eval creation or enforcement review responsibilities

Compensation

Annual Salary: $230,000 - $270,000 USD

Logistics

Location-based hybrid policy: staff are expected to be in one of Anthropic’s offices at least 25% of the time; some roles may require more time in office
Remote-friendly (travel required)
Visa sponsorship: Anthropic does sponsor visas and retains an immigration lawyer, though sponsorship availability may vary by role/candidate

Benefits

Competitive compensation and benefits
Optional equity donation matching
Generous vacation and parental leave
Flexible working hours and pleasant office spaces

How we work

Collaborative, research-driven approach focused on large-scale research efforts and high impact
Frequent research discussions and an emphasis on communication and cross-functional collaboration