Full-Stack Software Engineer, Reinforcement Learning

at Anthropic

📍 New York City, United States
📍 San Francisco, United States

USD 300,000-405,000 per year

MIDDLE

✅ Hybrid

✅ Visa Sponsorship

Used Tools & Technologies

Machine Learning

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Docker @ 3 TypeScript @ 5 Python @ 5 GCP @ 3 CI/CD @ 3 AWS @ 3 Communication @ 3 React @ 5 API @ 3 QA @ 3 LLM @ 2 Audit @ 3 Observability @ 3 Reinforcement Learning @ 3

Details

Anthropic’s Reinforcement Learning (RL) organization builds the platforms, tools, and interfaces that power environment creation, large-scale data collection, and training observability for Claude. This role owns product surfaces end-to-end — from backend services and APIs to web UIs used by researchers, external vendors, and thousands of data labelers. You do not need an ML research background; the focus is on shipping polished, reliable full‑stack systems that enable high-quality training data and scalable RL pipelines.

Responsibilities

Build and extend web platforms for RL environment creation, configuration, versioning, and validation workflows
Develop vendor-facing interfaces and tooling for external partners to create, submit, and iterate on training environments
Design and implement platforms for human data collection at scale: labeling workflows, QA systems, and feedback mechanisms to surface reward-signal integrity issues
Build evaluation dashboards and observability UIs giving researchers real-time insights into environment quality, training run health, and reward hacking
Create backend services and APIs connecting environment authoring tools, data collection systems, and RL training infrastructure
Build and expand scalable code data generation pipelines producing diverse programming tasks across languages and difficulty levels
Develop onboarding automation and documentation tooling for rapid ramp-up of vendors and internal users
Partner closely with RL researchers, data operations, and vendor management to translate ambiguous requirements into well-scoped, well-designed products

Requirements

Strong software engineering fundamentals with real full-stack experience (database schema through frontend)
Proficiency in Python and a modern web stack (React, TypeScript, or similar)
Experience building reliable backend services and APIs
Demonstrated track record of shipping systems that solved hard problems and improved team throughput
Strong product and UX sensibility for both technical researchers and non-technical labelers
Excellent communication skills and ability to translate vague asks into scoped work
Ability to operate with high agency in a fast-moving environment

Preferred / Nice-to-Have

Experience building data collection, labeling, or annotation platforms at scale
Background building multi-tenant platforms with role-based access, audit trails, and vendor management workflows
Experience with cloud infrastructure (GCP or AWS), Docker, and CI/CD pipelines
Familiarity with LLM training, fine-tuning, or evaluation workflows
Experience with async Python (Trio, asyncio) or high-throughput API design
Background in dashboards, monitoring, or observability tooling
Experience working directly with external vendors or partners on technical integrations

Representative Projects

Unified platform for human data collection integrating labeling workflows, vendor management, and QA
Vendor onboarding automation handling Docker registry access, API token management, and environment validation
Evaluation and observability dashboards that detect reward hacks and measure environment difficulty
Environment quality review workflows and automated validation pipelines before production training

Compensation

Annual salary range: $300,000 - $405,000 USD

Logistics

Minimum education: Bachelor’s degree or equivalent combination of education/training/experience
Minimum years of experience: correlates with internal job level requirements
Location-based hybrid policy: staff expected to be in one of our offices at least 25% of the time (some roles may require more)
Visa sponsorship: Anthropic states they do sponsor visas and retain an immigration lawyer to assist, though sponsorship is not guaranteed for every role/candidate