Used Tools & Technologies
Machine LearningRequired Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Docker @ 3
TypeScript @ 5
Python @ 5
GCP @ 3
CI/CD @ 3
AWS @ 3
Communication @ 3
React @ 5
API @ 3
QA @ 3
LLM @ 2
Audit @ 3
Observability @ 3
Reinforcement Learning @ 3
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
Anthropic’s Reinforcement Learning (RL) organization builds the platforms, tools, and interfaces that power environment creation, large-scale data collection, and training observability for Claude. This role owns product surfaces end-to-end — from backend services and APIs to web UIs used by researchers, external vendors, and thousands of data labelers. You do not need an ML research background; the focus is on shipping polished, reliable full‑stack systems that enable high-quality training data and scalable RL pipelines.
Responsibilities
- Build and extend web platforms for RL environment creation, configuration, versioning, and validation workflows
- Develop vendor-facing interfaces and tooling for external partners to create, submit, and iterate on training environments
- Design and implement platforms for human data collection at scale: labeling workflows, QA systems, and feedback mechanisms to surface reward-signal integrity issues
- Build evaluation dashboards and observability UIs giving researchers real-time insights into environment quality, training run health, and reward hacking
- Create backend services and APIs connecting environment authoring tools, data collection systems, and RL training infrastructure
- Build and expand scalable code data generation pipelines producing diverse programming tasks across languages and difficulty levels
- Develop onboarding automation and documentation tooling for rapid ramp-up of vendors and internal users
- Partner closely with RL researchers, data operations, and vendor management to translate ambiguous requirements into well-scoped, well-designed products
Requirements
- Strong software engineering fundamentals with real full-stack experience (database schema through frontend)
- Proficiency in Python and a modern web stack (React, TypeScript, or similar)
- Experience building reliable backend services and APIs
- Demonstrated track record of shipping systems that solved hard problems and improved team throughput
- Strong product and UX sensibility for both technical researchers and non-technical labelers
- Excellent communication skills and ability to translate vague asks into scoped work
- Ability to operate with high agency in a fast-moving environment
Preferred / Nice-to-Have
- Experience building data collection, labeling, or annotation platforms at scale
- Background building multi-tenant platforms with role-based access, audit trails, and vendor management workflows
- Experience with cloud infrastructure (GCP or AWS), Docker, and CI/CD pipelines
- Familiarity with LLM training, fine-tuning, or evaluation workflows
- Experience with async Python (Trio, asyncio) or high-throughput API design
- Background in dashboards, monitoring, or observability tooling
- Experience working directly with external vendors or partners on technical integrations
Representative Projects
- Unified platform for human data collection integrating labeling workflows, vendor management, and QA
- Vendor onboarding automation handling Docker registry access, API token management, and environment validation
- Evaluation and observability dashboards that detect reward hacks and measure environment difficulty
- Environment quality review workflows and automated validation pipelines before production training
Compensation
- Annual salary range: $300,000 - $405,000 USD
Logistics
- Minimum education: Bachelor’s degree or equivalent combination of education/training/experience
- Minimum years of experience: correlates with internal job level requirements
- Location-based hybrid policy: staff expected to be in one of our offices at least 25% of the time (some roles may require more)
- Visa sponsorship: Anthropic states they do sponsor visas and retain an immigration lawyer to assist, though sponsorship is not guaranteed for every role/candidate