ML/Research Engineer, Safeguards

at Anthropic

📍 New York City, United States
📍 San Francisco, United States

USD 350,000-500,000 per year

MIDDLE

✅ Hybrid

✅ Visa Sponsorship

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Python @ 5 Machine Learning @ 3 Hiring @ 3 Communication @ 6

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The Safeguards ML team builds systems to detect and mitigate misuse of AI systems, protect user wellbeing, and ensure models behave appropriately across contexts. This role works across the research-to-deployment pipeline to create classifiers, monitoring systems, threat models, and mitigations that keep products safe as capabilities advance.

Responsibilities

Develop classifiers to detect misuse and anomalous behavior at scale, including building synthetic data pipelines for training classifiers and methods to automatically source representative evaluations to iterate on.
Build systems to monitor for harms that span multiple exchanges (e.g., coordinated cyber attacks and influence operations) and develop methods for aggregating and analyzing signals across contexts.
Evaluate and improve the safety of agentic products by developing threat models, creating environments to test for agentic risks, and developing and deploying mitigations for prompt injection attacks.
Conduct research on automated red-teaming, adversarial robustness, and other research that helps test for or find misuse.

Requirements

4+ years of experience in ML engineering, research engineering, or applied research (academia or industry).
Proficiency in Python and experience building ML systems.
Comfortable working across the research-to-deployment pipeline, from exploratory experiments to production systems.
Concern for misuse risks of AI systems and motivation to mitigate them.
Strong communication skills and ability to explain complex technical concepts to non-technical stakeholders.
Education: at least a Bachelor's degree in a related field or equivalent experience.

Strong candidates may also have experience with

Language modeling and transformers.
Building classifiers, anomaly detection systems, or behavioral ML.
Adversarial machine learning or red-teaming.
Interpretability or probes.
Reinforcement learning.
High-performance, large-scale ML systems.

Compensation

Annual Salary: $350,000 - $500,000 USD. Total compensation for full-time employees includes equity and benefits.

Logistics

Location: San Francisco, CA or New York City, NY.
Location-based hybrid policy: staff expected to be in one of our offices at least 25% of the time (some roles may require more office time).
Visa sponsorship: Anthropic does sponsor visas and retains an immigration lawyer to assist, although sponsorship success may vary by role and candidate.
We encourage applicants who may not meet every qualification to apply.

Benefits & Culture

Competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and office space for collaboration.
Collaborative research-driven environment focused on high-impact AI safety and trustworthy, steerable systems.
Guidance for candidates on permitted AI usage during the hiring process is provided.