Software Engineer, Safeguards

at Anthropic

📍 New York City, United States
📍 San Francisco, United States

USD 320,000-425,000 per year

MIDDLE

✅ Hybrid

✅ Visa Sponsorship

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

TypeScript @ 5 Python @ 5 Communication @ 6 API @ 3 Fraud @ 7

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The Safeguards team builds safety and oversight mechanisms for AI systems to monitor models, prevent misuse, and ensure user well-being. This role focuses on building systems to detect unwanted model behaviors and prevent disallowed use of models while upholding principles of safety, transparency, and oversight.

Responsibilities

Develop monitoring systems to detect unwanted behaviors from API partners and potentially take automated enforcement actions
Surface detections in internal dashboards for analyst review
Build abuse detection mechanisms and supporting infrastructure
Surface abuse patterns to research teams to harden models at training time
Build robust, reliable multi-layered defenses for real-time safety improvements that work at scale
Work across the stack to implement detection and enforcement systems

Requirements

Bachelor’s degree in Computer Science, Software Engineering, or equivalent experience
5–10+ years of experience in a software engineering role, preferably focused on integrity, spam, fraud, or abuse detection and mitigation
Proficiency in Python and TypeScript
Ability to work across the stack (full-stack development)
Strong communication skills; able to explain complex technical concepts to non-technical stakeholders

Strong candidates may also

Have experience building trust-and-safety detection mechanisms and interventions for AI/ML systems
Have experience with prompt engineering, jailbreak attacks, and other adversarial inputs
Have worked closely with operational teams to build custom internal tooling

Logistics

Locations: San Francisco, CA and New York City, NY (United States)
Location-based hybrid policy: staff are expected to be in an office at least 25% of the time
Education: at least a Bachelor’s degree in a related field or equivalent experience
Visa sponsorship: Anthropic states they do sponsor visas and retain an immigration lawyer to help with sponsorship when they make an offer

Compensation and benefits

Annual salary range: $320,000 - $425,000 USD
Total compensation for full-time employees includes equity and benefits
Company highlights: competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and office collaboration spaces

Additional notes

Applications are reviewed on a rolling basis (no stated application deadline)
The posting encourages candidates from diverse and underrepresented groups to apply
Guidance on candidate AI usage and application process policies are provided by Anthropic