Machine Learning Systems Engineer, RL Engineering

at Anthropic

📍 New York City, United States
📍 San Francisco, United States
📍 Seattle, United States

USD 300,000-405,000 per year

MIDDLE

✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Python @ 3 Algorithms @ 3 Distributed Systems @ 3 Machine Learning @ 3 Communication @ 3 LLM @ 3

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.

You want to build the cutting-edge systems that train AI models like Claude. You're excited to work at the frontier of machine learning, implementing and improving advanced techniques to create ever more capable, reliable and steerable AI. As an ML Systems Engineer on our Reinforcement Learning Engineering team, you'll be responsible for the critical algorithms and infrastructure that our researchers depend on to train models. Your work will directly enable breakthroughs in AI capabilities and safety. You'll focus obsessively on improving the performance, robustness, and usability of these systems so our research can progress as quickly as possible. You're energized by the challenge of supporting and empowering our research team in the mission to build beneficial AI systems.

Our finetuning researchers train our production Claude models, and internal research models, using RLHF and other related methods. Your job will be to build, maintain, and improve the algorithms and systems that these researchers use to train models. You’ll be responsible for improving the speed, reliability, and ease-of-use of these systems.

Responsibilities

Build, maintain, and improve algorithms and infrastructure used to train models (production and research).
Improve the speed, reliability, and ease-of-use of finetuning and RL training systems.
Profile reinforcement learning pipelines and identify opportunities for performance improvements.
Build systems that regularly launch training jobs in test environments to quickly detect pipeline issues.
Make changes to finetuning systems to support new model architectures.
Build instrumentation to detect and eliminate Python GIL contention in training code.
Diagnose and fix performance regressions in training runs.
Implement stable, fast versions of new training algorithms proposed by researchers.
Collaborate closely with finetuning researchers and other engineering teams; engage in pair programming and design reviews.

Requirements

4+ years of software engineering experience.
Experience or strong interest in systems and tools that increase researcher productivity.
Results-oriented with a bias toward flexibility and impact; willingness to take on tasks beyond strict job boundaries.
Enjoy pair programming and collaborative development practices.
Interest in and desire to learn more about machine learning research.
Care about the societal impacts of AI systems.

Strong candidates may also have experience with:

High performance, large-scale distributed systems.
Large-scale LLM training and finetuning.
Python.
Implementing LLM finetuning algorithms such as RLHF.

Education & logistics:

At least a Bachelor's degree in a related field or equivalent experience is required.
Location-based hybrid policy: staff are expected to be in one of Anthropic's offices at least 25% of the time (some roles may require more).
Visa sponsorship: Anthropic does sponsor visas and retains immigration counsel; sponsorship success may vary by role and candidate.

Representative projects

Profiling the reinforcement learning pipeline to find performance improvements.
Building a system to regularly launch training jobs in test environments to detect problems quickly.
Adapting finetuning systems to new model architectures.
Instrumenting training code to detect and eliminate Python GIL contention.
Diagnosing and fixing training slowdowns that occur after many steps.
Implementing stable, fast versions of new training algorithms proposed by researchers.

Benefits

Competitive compensation and benefits.
Equity and optional equity donation matching.
Generous vacation and parental leave.
Flexible working hours and a collaborative office space.

How we're different

Anthropic treats high-impact AI research as big science and focuses on a few large-scale research efforts as a cohesive team.
Emphasis on collaboration, communication skills, and empirical AI research directions (e.g., GPT-3, interpretability, scaling laws, learning from human preferences).

Application

Applications are reviewed on a rolling basis; there is no deadline to apply. Applicants are encouraged to apply even if they don't meet every qualification.