Research Engineer, Frontier Red Team (RSP Evaluations)

USD 315,000-425,000 per year
MIDDLE
βœ… Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Python @ 3 Debugging @ 3 API @ 3 LLM @ 3

Details

Anthropic is building reliable, interpretable, and steerable AI systems. This production engineering role focuses on building and operating automated evaluation infrastructure that systematically tests frontier AI models for dangerous capabilities. You will create scalable, reliable evaluation pipelines, work closely with domain experts (e.g., biosecurity, cybersecurity), and operate systems during high-stakes model launches.

Responsibilities

  • Build and maintain automated evaluation systems using distributed infrastructure
  • Create robust evaluation pipelines that can run thousands of model capability tests
  • Develop tools that allow domain experts to quickly deploy new safety evaluations
  • Ensure evaluation systems run reliably during high-stakes model launches
  • Write production-quality Python code for evaluation infrastructure that scales
  • Monitor and operate evaluation systems during critical assessment periods
  • Collaborate with domain experts to translate safety requirements into technical implementations
  • Participate in high-stakes execution during model launch windows and operate monitoring/alerting systems

Requirements

  • Strong Python programming skills; ability to write clean, maintainable, production-quality code
  • Comfortable working with LLMs programmatically (APIs, prompting, output processing)
  • Experience debugging complex systems and resolving issues under pressure
  • Experience or familiarity with distributed infrastructure and containerized environments for production workloads
  • Knowledge of systems optimization and building efficient, scalable solutions
  • Experience building or working with evaluation frameworks, testing infrastructure, or automated assessment systems is highly relevant
  • Comfortable operating production systems and building monitoring/observability for evaluation pipelines
  • Interest in AI safety and responsible model development
  • Ability to work independently on 1–3 month projects while collaborating effectively with domain experts
  • Education: At least a Bachelor's degree in a related field or equivalent experience

Strong candidates may also have

  • Background in physics, systems engineering, or other fields requiring critical thinking about experimental results
  • Familiarity with adversarial testing, red-teaming, or finding edge cases in complex systems
  • Understanding of LLM capabilities and limitations
  • Experience with containerization and production operations

Representative projects

  • Build automated red-teaming systems that generate and evaluate thousands of adversarial prompts
  • Create evaluation pipelines that systematically test model capabilities across multiple risk domains
  • Develop monitoring infrastructure that tracks evaluation results and detects capability jumps
  • Implement reliable containerized environments for running large-scale model assessments
  • Build tools that allow biosecurity experts to quickly create and deploy new biological risk evaluations
  • Create automated analysis systems that process evaluation results and generate capability reports

Logistics & Benefits

  • Annual salary range: $315,000 - $425,000 USD
  • Location-based hybrid policy: staff expected to be in one of Anthropic's offices at least ~25% of the time (some roles may require more)
  • Visa sponsorship: Anthropic does sponsor visas in many cases and retains immigration counsel to assist
  • Education requirement: Bachelor's degree or equivalent experience
  • Benefits: competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, collaborative office spaces

Other notes

  • Deadline: None (applications reviewed on a rolling basis)
  • Team mission: Ensure AI systems remain safe and beneficial by building reliable evaluation and red-teaming infrastructure