Software Engineer, RL Data

at Anthropic

📍 London, United Kingdom
📍 New York City, United States
📍 San Francisco, United States
📍 Seattle, United States

USD 320,000-485,000 per year

MIDDLE

✅ Hybrid

✅ Visa Sponsorship

Used Tools & Technologies

Not specified

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Security @ 3 Kubernetes @ 3 TypeScript @ 6 Python @ 6 Communication @ 3 API @ 3 QA @ 3 LLM @ 3 Compliance @ 3 AI @ 3 Reinforcement Learning @ 3 Data Pipelines @ 3

Details

Anthropic’s RL Data team builds the systems that produce high-quality reinforcement learning data for Claude: data collection pipelines, human feedback tooling, the execution environments RL tasks run in, and the quality assurance that keeps training data trustworthy at scale. The team’s goal is to make Claude genuinely great at complex, real-world work and to point those capabilities at high-impact, beneficial uses (while acknowledging dual-use risks).

Responsibilities

Own significant parts of the stack end-to-end, from technical architecture through operational work that makes it succeed
Build data collection pipelines, read and iterate on the transcripts they produce, and tune prompts, evals, and graders until outputs are high-quality
Develop and improve QA frameworks to catch reward hacking and ensure environment quality
Build interfaces that make collecting human data fast and painless for contributors
Harden execution environments (sandboxing, snapshotting, tool coverage) so tasks hold up at training scale
Embed with teams and domain experts who use the systems: design pipelines and evals with them, support them directly, and ship required improvements
Work with operations, security, and compliance partners to roll systems out to new users and manage technical relationships with external data vendors

Minimum qualifications

Strong software engineering skills and proficiency in at least one modern programming language (the team mainly uses Python and TypeScript)
Experience designing, building, and running backend systems or infrastructure
Effective use of AI tools in day-to-day work
Willingness to own problems end-to-end, including non-engineering responsibilities
Proactive, open communication and the ability to run a workstream and escalate early when needed
Comfort iterating quickly in ambiguous, fast-changing situations
Care about the societal impacts of your work

Preferred qualifications

Experience building LLM-powered systems: prompt pipelines, evals, or products with models in the loop
Experience with reinforcement learning on LLMs: creating environments, rewards, graders, or training data
Time as a forward-deployed engineer, founder, or early startup engineer (owners of outcomes)
Experience shipping user-facing products or internal platforms: interviewing users, removing friction, improving experience
Experience building data pipelines or integrations that move, transform, and index data from many sources
Experience building connectors or integrations with third-party tools and APIs
Experience with containers, Kubernetes, or simulation infrastructure
Experience handling sensitive data or working under tight security controls
Experience working with external data vendors
Basic familiarity with AI safety or security research

Representative projects

Make QA checks robust against a model that’s learning to game them
Build a review flow that lets a busy expert check an RL task in under five minutes
Reduce time from a rough task idea to a QA-passed RL task from days to hours
Spend focused time with a team using the platform, then ship the most impactful fixes
Harden sandboxed environments so tasks behave correctly across millions of rollouts
Onboard a new data vendor and resolve practical integration issues

Logistics

Annual salary: $320,000 - $485,000 USD
Minimum education: Bachelor’s degree or equivalent combination of education, training, and/or experience
Required field of study: A field relevant to the role as demonstrated through coursework, training, or professional experience
Minimum years of experience: Will correlate with internal job level requirements for the position
Location-based hybrid policy: Currently, staff are expected to be in one of our offices at least 25% of the time (some roles may require more time in office)
Visa sponsorship: Anthropic states they do sponsor visas and retain an immigration lawyer to help, though not all roles/candidates can be successfully sponsored

Company & culture

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems and to prioritize safety and societal impacts
The company values collaborative, high-impact research and communication skills
Benefits mentioned: competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and office space for collaboration

How to apply

Application requires contact information and either a resume or LinkedIn profile; candidates are encouraged to apply even if they do not meet every qualification listed. Anthropic provides guidance on permitted AI usage during the application process.