Research Engineer, Model Evaluations

at Anthropic

📍 New York City, United States
📍 San Francisco, United States

USD 300,000-405,000 per year

MIDDLE

✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Python @ 6 GitHub @ 3 Machine Learning @ 3 Leadership @ 3 Communication @ 6 Technical Leadership @ 3

Details

Anthropic is building reliable, interpretable, and steerable AI systems. The Research Engineer on the Model Evaluations team will lead the design and implementation of Anthropic's evaluation platform — a critical system that shapes how the company measures, understands, and improves model capabilities and safety. This role sits at the intersection of research and engineering and will directly influence training decisions and the model development roadmap. You will collaborate closely with training teams, alignment researchers, and safety teams to ensure models meet high standards prior to deployment.

Responsibilities

Design novel evaluation methodologies to assess model capabilities across domains including reasoning, safety, helpfulness, and harmlessness.
Lead the design and architecture of the evaluation platform, ensuring it scales with evolving model capabilities and research needs.
Implement and maintain high-throughput evaluation pipelines that run during production training, providing real-time insights to guide training decisions.
Analyze evaluation results to identify patterns, failure modes, and opportunities for model improvement; translate findings into actionable recommendations.
Partner with research teams to develop domain-specific evaluations probing for emerging capabilities and potential risks.
Build infrastructure enabling rapid iteration on evaluation design, supporting both automated and human-in-the-loop assessment approaches.
Establish best practices and standards for evaluation development across the organization.
Mentor team members and contribute to the growth of evaluation expertise at Anthropic.
Coordinate evaluation efforts during critical training runs, ensuring comprehensive coverage and timely results.
Contribute to research publications and external communications about evaluation methodologies and findings.

Requirements

Experience designing and implementing evaluation systems for machine learning models, particularly large language models (LLMs).
Demonstrated technical leadership experience, either formally or via leading complex technical projects.
Strong systems engineering and experimental design skills; comfortable building infrastructure while maintaining scientific rigor.
Strong programming skills in Python.
Experience with distributed computing frameworks and building high-throughput pipelines.
Ability to translate research needs into engineering constraints and pragmatic solutions.
Results-oriented and able to work in fast-paced environments where priorities shift based on research findings.
Strong communication skills and ability to collaborate across research, training, and safety teams.
Commitment to AI safety and understanding of societal impacts of deployed systems.
Experience with statistical analysis and drawing conclusions from large-scale experimental data.
Minimum: Bachelor's degree in a related field or equivalent experience.

Strong candidates may also have

Experience running evaluation during model training in production environments.
Familiarity with safety evaluation frameworks and red teaming methodologies.
Background in psychometrics, experimental psychology, or other measurement-focused fields.
Experience with reinforcement learning evaluation or multi-agent systems.
Contributions to open-source evaluation benchmarks or frameworks.
Knowledge of prompt engineering and its role in evaluation design.
Experience managing evaluation infrastructure at scale (thousands of experiments).
Published research in ML evaluation, benchmarking, or related areas.

Representative projects

Designing comprehensive evaluation suites assessing models across hundreds of capability dimensions.
Building real-time evaluation dashboards for multi-week training runs.
Developing novel evaluation approaches for emerging capabilities such as multi-step reasoning or tool use.
Creating automated systems to detect regressions in model performance or safety properties.
Implementing efficient evaluation sampling strategies balancing coverage with compute constraints.
Collaborating with external partners to develop industry-standard evaluation benchmarks.
Building infrastructure to support human evaluation at scale, including quality control and aggregation systems.

Compensation and Logistics

Annual Salary: $300,000 - $405,000 USD.
Total compensation package may include equity, benefits, and incentive compensation.
Education: Minimum Bachelor's degree in a related field or equivalent experience.
Location-based hybrid policy: staff are expected to be in one of Anthropic's offices at least 25% of the time; some roles may require more in-office time.
Visa sponsorship: Anthropic supports visa sponsorship where feasible and retains immigration counsel to assist.

Why Anthropic / How we're different

Anthropic focuses on large-scale empirical AI research as a single cohesive team. The organization values impact, collaboration, and communication. The team's research directions include prior work on GPT-3, circuit-based interpretability, multimodal neurons, scaling laws, AI & compute, concrete problems in AI safety, and learning from human preferences.

How to apply

Follow the application form on the listing. Anthropic encourages applicants from diverse backgrounds and asks candidates to review their AI usage guidance for the application process. The application requests either a resume or LinkedIn profile, and includes optional fields for publications, GitHub, and other materials.