Software Engineer, Safeguards

USD 320,000-425,000 per year
MIDDLE
✅ Hybrid
✅ Visa Sponsorship

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

TypeScript @ 5 Python @ 5 Communication @ 6 API @ 3 Fraud @ 7

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The Safeguards team builds safety and oversight mechanisms for AI systems to monitor models, prevent misuse, and ensure user well-being. This role focuses on building systems to detect unwanted model behaviors and prevent disallowed use of models while upholding principles of safety, transparency, and oversight.

Responsibilities

  • Develop monitoring systems to detect unwanted behaviors from API partners and potentially take automated enforcement actions
  • Surface detections in internal dashboards for analyst review
  • Build abuse detection mechanisms and supporting infrastructure
  • Surface abuse patterns to research teams to harden models at training time
  • Build robust, reliable multi-layered defenses for real-time safety improvements that work at scale
  • Work across the stack to implement detection and enforcement systems

Requirements

  • Bachelor’s degree in Computer Science, Software Engineering, or equivalent experience
  • 5–10+ years of experience in a software engineering role, preferably focused on integrity, spam, fraud, or abuse detection and mitigation
  • Proficiency in Python and TypeScript
  • Ability to work across the stack (full-stack development)
  • Strong communication skills; able to explain complex technical concepts to non-technical stakeholders

Strong candidates may also

  • Have experience building trust-and-safety detection mechanisms and interventions for AI/ML systems
  • Have experience with prompt engineering, jailbreak attacks, and other adversarial inputs
  • Have worked closely with operational teams to build custom internal tooling

Logistics

  • Locations: San Francisco, CA and New York City, NY (United States)
  • Location-based hybrid policy: staff are expected to be in an office at least 25% of the time
  • Education: at least a Bachelor’s degree in a related field or equivalent experience
  • Visa sponsorship: Anthropic states they do sponsor visas and retain an immigration lawyer to help with sponsorship when they make an offer

Compensation and benefits

  • Annual salary range: $320,000 - $425,000 USD
  • Total compensation for full-time employees includes equity and benefits
  • Company highlights: competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and office collaboration spaces

Additional notes

  • Applications are reviewed on a rolling basis (no stated application deadline)
  • The posting encourages candidates from diverse and underrepresented groups to apply
  • Guidance on candidate AI usage and application process policies are provided by Anthropic