Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
SQL @ 5
Scoping @ 3
Communication @ 6
Prioritization @ 6
Claude Code @ 3
AI @ 3
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The Safeguards team enforces policies, protects users, and ensures the platform is not misused. This role focuses on Safety Evaluations — running and monitoring evaluations to ensure models meet safety and policy standards before and after launch, coordinating creation of new evals, driving mitigations, and building processes and documentation to scale evaluation work.
Responsibilities
- Support model launch readiness by running evaluations, monitoring and interpreting results, and surfacing regressions or unexpected behavior changes to relevant stakeholders
- Partner with policy and domain experts throughout the evaluation lifecycle — from identifying risks and scoping evaluation approaches to coordinating creation and ensuring evals remain current with evolving policies, threat vectors, and model capabilities
- Work with cross-functional stakeholders to manage evaluation outcomes, interpret results, and drive mitigations where needed
- Design processes and eval paradigms to keep evaluations high-signal and insightful as models improve
- Build processes and frameworks for creating product-specific evaluations as Anthropic’s product surface expands
- Help design and scope tooling improvements to support evolving eval needs and expand self-serve eval creation for non-technical users
- Write and maintain rigorous documentation for evaluation creation, execution, and interpretation as eval tooling and processes are built out
Requirements / Qualifications
- Experience in trust and safety, content operations, policy enforcement, or a related operational role at a technology company
- Comfortable working in ambiguous, fast-moving environments and figuring out paths forward with incomplete information
- Experience building processes, workflows, or programs from scratch (zero-to-one work)
- Strong program management instincts: creating structure around complex, multi-stakeholder efforts, tracking timelines, dependencies, and deliverables
- Eagerness to expand technical toolkit and adopt internal tools and AI-assisted workflows (e.g., Claude Code)
- Ability to manage multiple concurrent workstreams across different domain areas with strong prioritization and context-switching
- Strong written and cross-functional communication skills
- We require at least a Bachelor’s degree in a related field or equivalent experience
Strong candidates may also have
- Experience operating under tight, high-stakes timelines (product launches, incident response, regulatory deadlines)
- Experience coordinating across engineering, policy, and product teams to translate findings into concrete action
- Experience building and maintaining SOPs, runbooks, and operational documentation in fast-changing environments
- Proficiency with data tools (SQL, dashboards, spreadsheets) sufficient to maintain and improve workflows
- Comfort working with sensitive content areas as part of eval creation or enforcement review responsibilities
Compensation
- Annual Salary: $230,000 - $270,000 USD
Logistics
- Location-based hybrid policy: staff are expected to be in one of Anthropic’s offices at least 25% of the time; some roles may require more time in office
- Remote-friendly (travel required)
- Visa sponsorship: Anthropic does sponsor visas and retains an immigration lawyer, though sponsorship availability may vary by role/candidate
Benefits
- Competitive compensation and benefits
- Optional equity donation matching
- Generous vacation and parental leave
- Flexible working hours and pleasant office spaces
How we work
- Collaborative, research-driven approach focused on large-scale research efforts and high impact
- Frequent research discussions and an emphasis on communication and cross-functional collaboration