Software Engineer, RL Data
at Anthropic
📍 London, United Kingdom
📍 New York City, United States
📍 San Francisco, United States
📍 Seattle, United States
📍 New York City, United States
📍 San Francisco, United States
📍 Seattle, United States
USD 320,000-485,000 per year
Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Security @ 3
Kubernetes @ 3
TypeScript @ 6
Python @ 6
Communication @ 3
API @ 3
QA @ 3
LLM @ 3
Compliance @ 3
AI @ 3
Reinforcement Learning @ 3
Data Pipelines @ 3
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
Anthropic’s RL Data team builds the systems that produce high-quality reinforcement learning data for Claude: data collection pipelines, human feedback tooling, the execution environments RL tasks run in, and the quality assurance that keeps training data trustworthy at scale. The team’s goal is to make Claude genuinely great at complex, real-world work and to point those capabilities at high-impact, beneficial uses (while acknowledging dual-use risks).
Responsibilities
- Own significant parts of the stack end-to-end, from technical architecture through operational work that makes it succeed
- Build data collection pipelines, read and iterate on the transcripts they produce, and tune prompts, evals, and graders until outputs are high-quality
- Develop and improve QA frameworks to catch reward hacking and ensure environment quality
- Build interfaces that make collecting human data fast and painless for contributors
- Harden execution environments (sandboxing, snapshotting, tool coverage) so tasks hold up at training scale
- Embed with teams and domain experts who use the systems: design pipelines and evals with them, support them directly, and ship required improvements
- Work with operations, security, and compliance partners to roll systems out to new users and manage technical relationships with external data vendors
Minimum qualifications
- Strong software engineering skills and proficiency in at least one modern programming language (the team mainly uses Python and TypeScript)
- Experience designing, building, and running backend systems or infrastructure
- Effective use of AI tools in day-to-day work
- Willingness to own problems end-to-end, including non-engineering responsibilities
- Proactive, open communication and the ability to run a workstream and escalate early when needed
- Comfort iterating quickly in ambiguous, fast-changing situations
- Care about the societal impacts of your work
Preferred qualifications
- Experience building LLM-powered systems: prompt pipelines, evals, or products with models in the loop
- Experience with reinforcement learning on LLMs: creating environments, rewards, graders, or training data
- Time as a forward-deployed engineer, founder, or early startup engineer (owners of outcomes)
- Experience shipping user-facing products or internal platforms: interviewing users, removing friction, improving experience
- Experience building data pipelines or integrations that move, transform, and index data from many sources
- Experience building connectors or integrations with third-party tools and APIs
- Experience with containers, Kubernetes, or simulation infrastructure
- Experience handling sensitive data or working under tight security controls
- Experience working with external data vendors
- Basic familiarity with AI safety or security research
Representative projects
- Make QA checks robust against a model that’s learning to game them
- Build a review flow that lets a busy expert check an RL task in under five minutes
- Reduce time from a rough task idea to a QA-passed RL task from days to hours
- Spend focused time with a team using the platform, then ship the most impactful fixes
- Harden sandboxed environments so tasks behave correctly across millions of rollouts
- Onboard a new data vendor and resolve practical integration issues
Logistics
- Annual salary: $320,000 - $485,000 USD
- Minimum education: Bachelor’s degree or equivalent combination of education, training, and/or experience
- Required field of study: A field relevant to the role as demonstrated through coursework, training, or professional experience
- Minimum years of experience: Will correlate with internal job level requirements for the position
- Location-based hybrid policy: Currently, staff are expected to be in one of our offices at least 25% of the time (some roles may require more time in office)
- Visa sponsorship: Anthropic states they do sponsor visas and retain an immigration lawyer to help, though not all roles/candidates can be successfully sponsored
Company & culture
- Anthropic’s mission is to create reliable, interpretable, and steerable AI systems and to prioritize safety and societal impacts
- The company values collaborative, high-impact research and communication skills
- Benefits mentioned: competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and office space for collaboration
How to apply
- Application requires contact information and either a resume or LinkedIn profile; candidates are encouraged to apply even if they do not meet every qualification listed. Anthropic provides guidance on permitted AI usage during the application process.