Research Engineer, Virtual Collaborator

USD 315,000-560,000 per year
MIDDLE
✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Python @ 3 Machine Learning @ 6 Communication @ 3 Slack @ 3 API @ 3

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. This role focuses on training Claude for virtual collaborator workflows by designing and implementing reinforcement learning environments, building data creation platforms, integrating organizational data, and developing evaluation systems to ensure models are robust and helpful for real-world knowledge work.

Responsibilities

  • Design and implement reinforcement learning pipelines targeted at virtual collaborator use cases (productivity, organizational navigation, vertical domains).
  • Build and scale a data-creation platform to generate high-quality, open-ended tasks with domain experts and crowdworkers.
  • Integrate real organizational data to create authentic training environments (internal knowledge, document workflows, financial models, etc.).
  • Develop robust rubric-based evaluation systems that maintain quality while avoiding reward hacking.
  • Train models (Claude) on advanced document manipulation: understanding, enhancing, and co-creating documents.
  • Partner directly with product teams to ensure training aligns with shipped features and product requirements.

Requirements

  • Very experienced Python programmer able to produce reliable, high-quality code.
  • Strong machine learning research experience, particularly in reinforcement learning and fine-tuning.
  • Experience at the intersection of research and product, with a pragmatic approach to shipping real-world solutions.
  • Comfortable with ambiguity and able to balance research rigor with shipping deadlines.
  • Ability to collaborate across teams (data operations, model training, product) and context-switch between research and engineering tasks.
  • At least a Bachelor’s degree in a related field or equivalent experience.

Strong candidates will also have experience with

  • Building human-in-the-loop training systems or crowdsourcing platforms.
  • Working with enterprise tools and APIs (Google Workspace, Microsoft Office, Slack, etc.).
  • Developing evaluation frameworks for open-ended tasks and reward modeling to prevent reward hacking.
  • Domain expertise in finance, legal, or healthcare workflows.
  • Creating scalable data pipelines with quality-control mechanisms and data operations workflows.
  • Translating product requirements into technical training objectives.

Logistics

  • Location: San Francisco, CA; New York City, NY; Seattle, WA.
  • Location-based hybrid policy: currently expect staff to be in an office at least 25% of the time (hybrid).
  • Visa sponsorship: Anthropic does sponsor visas and retains immigration support, though sponsorship success may vary by role.
  • Education: Minimum Bachelor’s degree in a related field or equivalent experience.

Benefits

  • Competitive compensation and benefits.
  • Optional equity donation matching.
  • Generous vacation and parental leave.
  • Flexible working hours and a collaborative office environment.

About Anthropic

Anthropic is a public benefit corporation focused on building steerable, trustworthy AI through large-scale empirical research. The team values collaboration, communication, and working on high-impact AI research problems. Applicants are encouraged to apply even if they don’t meet every qualification.