Full-Stack Software Engineer, Reinforcement Learning

USD 300,000-405,000 per year
MIDDLE
✅ Hybrid
✅ Visa Sponsorship

Used Tools & Technologies

Machine Learning

Required Skills & Competences

Docker @ 3 TypeScript @ 5 Python @ 5 GCP @ 3 CI/CD @ 3 AWS @ 3 Communication @ 3 React @ 5 API @ 3 QA @ 3 LLM @ 2 Audit @ 3 Observability @ 3 Reinforcement Learning @ 3

Details

Anthropic’s Reinforcement Learning (RL) organization builds the platforms, tools, and interfaces that power environment creation, large-scale data collection, and training observability for Claude. This role owns product surfaces end-to-end — from backend services and APIs to web UIs used by researchers, external vendors, and thousands of data labelers. You do not need an ML research background; the focus is on shipping polished, reliable full‑stack systems that enable high-quality training data and scalable RL pipelines.

Responsibilities

  • Build and extend web platforms for RL environment creation, configuration, versioning, and validation workflows
  • Develop vendor-facing interfaces and tooling for external partners to create, submit, and iterate on training environments
  • Design and implement platforms for human data collection at scale: labeling workflows, QA systems, and feedback mechanisms to surface reward-signal integrity issues
  • Build evaluation dashboards and observability UIs giving researchers real-time insights into environment quality, training run health, and reward hacking
  • Create backend services and APIs connecting environment authoring tools, data collection systems, and RL training infrastructure
  • Build and expand scalable code data generation pipelines producing diverse programming tasks across languages and difficulty levels
  • Develop onboarding automation and documentation tooling for rapid ramp-up of vendors and internal users
  • Partner closely with RL researchers, data operations, and vendor management to translate ambiguous requirements into well-scoped, well-designed products

Requirements

  • Strong software engineering fundamentals with real full-stack experience (database schema through frontend)
  • Proficiency in Python and a modern web stack (React, TypeScript, or similar)
  • Experience building reliable backend services and APIs
  • Demonstrated track record of shipping systems that solved hard problems and improved team throughput
  • Strong product and UX sensibility for both technical researchers and non-technical labelers
  • Excellent communication skills and ability to translate vague asks into scoped work
  • Ability to operate with high agency in a fast-moving environment

Preferred / Nice-to-Have

  • Experience building data collection, labeling, or annotation platforms at scale
  • Background building multi-tenant platforms with role-based access, audit trails, and vendor management workflows
  • Experience with cloud infrastructure (GCP or AWS), Docker, and CI/CD pipelines
  • Familiarity with LLM training, fine-tuning, or evaluation workflows
  • Experience with async Python (Trio, asyncio) or high-throughput API design
  • Background in dashboards, monitoring, or observability tooling
  • Experience working directly with external vendors or partners on technical integrations

Representative Projects

  • Unified platform for human data collection integrating labeling workflows, vendor management, and QA
  • Vendor onboarding automation handling Docker registry access, API token management, and environment validation
  • Evaluation and observability dashboards that detect reward hacks and measure environment difficulty
  • Environment quality review workflows and automated validation pipelines before production training

Compensation

  • Annual salary range: $300,000 - $405,000 USD

Logistics

  • Minimum education: Bachelor’s degree or equivalent combination of education/training/experience
  • Minimum years of experience: correlates with internal job level requirements
  • Location-based hybrid policy: staff expected to be in one of our offices at least 25% of the time (some roles may require more)
  • Visa sponsorship: Anthropic states they do sponsor visas and retain an immigration lawyer to assist, though sponsorship is not guaranteed for every role/candidate