Used Tools & Technologies
Machine LearningRequired Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 β basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 β daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 β you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 β exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Python @ 6
Hiring @ 3
Debugging @ 3
LLM @ 3
PyTorch @ 3
CUDA @ 3
GPU @ 3
AI @ 3
Reinforcement Learning @ 3
Profiling @ 3
- 1-2 β basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 β daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 β you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 β exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
Anthropic is hiring for the Code RL team within the Reinforcement Learning organization to advance models' ability to write, edit, test, debug, and ship real software end-to-end on real codebases with real tools. The role blends research and engineering: design RL environments and coding tasks, build reward signals and verifiers that capture what "good code" means, run training experiments on frontier models, diagnose model behavior, and improve the speed and reliability of training and evaluation pipelines. Code RL work spans agentic coding behaviors, code correctness, long-horizon autonomous engineering, and high-performance code for accelerators.
Responsibilities
- Design and implement RL environments, coding tasks, reward signals, and verifiers to evaluate code quality and behavior
- Run and analyze training experiments on large models; diagnose why models do or do not improve at software-engineering tasks
- Build, profile, and optimize pipelines and infrastructure to enable fast, reliable iteration of experiments at scale
- Own systems end-to-end and debug across the stack, from tooling and sandboxes to training jobs and verifiers
- Collaborate with alignment, red teams, and applied production training teams to ensure capabilities are effective and safe
Requirements
- Strong software-engineering skills and deep Python expertise, including async/concurrent programming
- Comfortable owning systems end-to-end and debugging across the stack
- Ability to balance research exploration with engineering implementation and to engage rigorously in experimental design and interpreting results
- Care about code quality, testing, and performance
- Passionate about developing safe and beneficial AI systems
Strong candidates may also have
- Experience with reinforcement learning, RLHF, post-training, or LLM fine-tuning
- Built coding agents, code-execution sandboxes, evaluation harnesses, verifiers, or developer tooling
- Background in program analysis, testing, verification, compilers, or formal methods
- Experience with PyTorch and large-scale distributed training; performance profiling and optimization of ML systems
- CUDA / GPU or TPU kernel experience and accelerator-performance intuition
- Experience with virtualization and sandboxed code execution environments
Logistics
- Locations: San Francisco, CA and New York City, NY (United States)
- Location-based hybrid policy: staff expected to be in one of our offices at least 25% of the time
- Minimum education: Bachelorβs degree or equivalent combination of education/training/experience
- Visa sponsorship: Anthropic states they do sponsor visas and will make reasonable efforts and retain an immigration lawyer to assist
- Annual salary range: $500,000 - $850,000 USD
Compensation & Benefits
- Competitive compensation and benefits
- Optional equity donation matching
- Generous vacation and parental leave
- Flexible working hours
- Office space for collaboration
About Anthropic & RL Teams
- Mission: create reliable, interpretable, and steerable AI systems that are safe and beneficial
- RL teams work on enabling models to use computers effectively, advancing code generation via RL, RL research for LLMs, scalable RL infrastructure/training, and enhancing model reasoning capabilities
- Emphasis on collaboration with alignment/frontier red teams and applying research at scale
How to Apply
- Apply via Anthropic's careers page; follow their guidance on candidate AI usage and beware of scams (recruiters contact from @anthropic.com only).