Machine Learning Systems Engineer, RL Engineering

at Anthropic

📍 San Francisco, United States

USD 300,000-405,000 per year

MIDDLE

✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Python @ 3 Algorithms @ 3 Distributed Systems @ 3 Machine Learning @ 3 LLM @ 3

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The team is a growing group of researchers, engineers, policy experts, and business leaders working to build beneficial AI systems.

Role overview

As an ML Systems Engineer on the Reinforcement Learning Engineering team, you will build and improve the algorithms and infrastructure researchers use to train models (production Claude models and internal research models). You will focus on improving performance, robustness, and usability of training systems so research can progress quickly. Your work will directly enable advances in AI capabilities and safety.

Responsibilities

Build, maintain, and improve algorithms and systems used for finetuning and RL training.
Improve speed, reliability, and ease-of-use of training pipelines and tooling.
Profile the reinforcement learning pipeline to find opportunities for improvement.
Build systems that regularly launch training jobs in test environments to quickly detect pipeline problems.
Modify finetuning systems to support new model architectures.
Build instrumentation to detect and eliminate Python GIL contention in training code.
Diagnose and fix performance regressions (e.g. training runs slowing after some steps).
Implement stable, fast versions of new training algorithms proposed by researchers.

Requirements

At least a Bachelor's degree in a related field or equivalent experience.
4+ years of software engineering experience.
Experience working on systems and tools that increase other people's productivity.
Results-oriented with a bias toward flexibility and impact; willing to pick up work beyond the strict job description.
Comfortable pair programming.
Desire to learn more about machine learning research and care about societal impacts of AI.

Strong candidates may also have experience with:

High-performance, large-scale distributed systems
Large-scale LLM training
Python
Implementing LLM finetuning algorithms (such as RLHF)

Representative projects

Profiling RL pipelines for improvements.
Building test-environment job-launching systems for pipeline validation.
Adapting finetuning systems for new model architectures.
Instrumenting code to reduce Python GIL contention.
Diagnosing and fixing training slowdowns.
Implementing new training algorithms from researchers.

Logistics

Location: San Francisco, CA.
Location-based hybrid policy: staff are expected to be in one of Anthropic's offices at least 25% of the time; some roles may require more office time.
Education: at least a Bachelor's degree in a related field or equivalent experience.
Visa sponsorship: Anthropic does sponsor visas where feasible and retains an immigration lawyer to assist, though not every role/candidate can be successfully sponsored.
Deadline to apply: None (applications reviewed on a rolling basis).

Benefits

Competitive compensation and benefits
Optional equity donation matching
Generous vacation and parental leave
Flexible working hours
Office space for collaboration

Other

Anthropic encourages applicants who may not meet every listed qualification to apply.
Guidance on candidates' AI usage during the application process is provided by Anthropic.