Used Tools & Technologies
Not specified
Required Skills & Competences ?
Python @ 3 Machine Learning @ 6 Communication @ 3 Slack @ 3 API @ 3Details
Anthropic’s mission is to create reliable, interpretable, and steerable AI systems that are safe and beneficial for users and society. The team includes researchers, engineers, policy experts, and business leaders working to build beneficial AI systems.
Responsibilities
- Design and implement reinforcement learning pipelines focused on virtual collaborator use cases such as productivity, organizational navigation, and vertical domains.
- Build and scale data creation platforms for generating high-quality, open-ended tasks involving domain experts and crowdworkers.
- Integrate real organizational data to create authentic training environments.
- Develop robust rubric-based evaluation systems to maintain quality and avoid reward hacking.
- Train Claude on advanced document manipulation including understanding, enhancing, and co-creating.
- Collaborate closely with product teams to align training with shipped features.
Requirements
- Extensive Python programming experience with the ability to write reliable, high-quality code.
- Strong machine learning research expertise, particularly in reinforcement learning and fine-tuning.
- Ability to work at the intersection of research and product, solving real-world problems pragmatically.
- Comfortable managing ambiguity and balancing research rigor with shipping deadlines.
- Excellent collaboration skills across multiple teams (data operations, model training, product).
- Ability to switch context between research and engineering tasks.
- A commitment to making AI genuinely helpful for enterprise workflows.
Strongly Preferred Experience
- Building human-in-the-loop training systems or crowdsourcing platforms.
- Working with enterprise tools and APIs such as Google Workspace, Microsoft Office, Slack.
- Developing evaluation frameworks for open-ended tasks.
- Domain expertise in finance, legal, or healthcare workflows.
- Creating scalable data pipelines with quality control.
- Expertise with reward modeling and preventing reward hacking in reinforcement learning systems.
- Translating product requirements into technical training objectives.
Benefits and Logistics
- Annual salary range: $315,000 - $560,000 USD.
- At least a Bachelor's degree in a related field or equivalent experience required.
- Hybrid location policy requiring at least 25% in-office presence at offices in San Francisco, New York City, or Seattle.
- Visa sponsorship is available with legal support.
- Inclusive and diverse team culture encouraging applications from all qualified candidates.
- Competitive compensation and benefits including optional equity donation matching, generous vacation and parental leave, flexible working hours, and collaborative office spaces.
About Anthropic
Anthropic is a public benefit corporation headquartered in San Francisco focused on high-impact AI research through large-scale collaborative efforts. They value communication and empirical science approaches, with research roots including GPT-3, interpretability, multimodal neurons, scaling laws, AI safety, and learning from human preferences.