Machine Learning Systems Engineer, RL Engineering

USD 300,000-405,000 per year
MIDDLE
✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Python @ 3 Algorithms @ 3 Distributed Systems @ 3 Machine Learning @ 3 Communication @ 3 LLM @ 3

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We build systems that train models like Claude and focus on improving performance, robustness, and usability of training pipelines so research can progress quickly. As an ML Systems Engineer on the Reinforcement Learning Engineering team, you will build, maintain, and improve the algorithms and systems researchers use to train production and internal models using RLHF and related methods.

Responsibilities

  • Build, maintain, and improve critical algorithms and infrastructure used by finetuning and RL researchers.
  • Improve speed, reliability, and ease-of-use of training systems for large models.
  • Profile reinforcement learning and training pipelines to find opportunities for improvement.
  • Build systems that regularly launch training jobs in test environments to detect pipeline problems quickly.
  • Make changes so finetuning systems work on new model architectures.
  • Build instrumentation to detect and eliminate Python GIL contention in training code.
  • Diagnose and fix performance regressions (e.g., training runs slowing after a number of steps).
  • Implement stable, fast versions of new training algorithms proposed by researchers.
  • Pair program with researchers and engineers and support cross-team collaboration.

Requirements

  • 4+ years of software engineering experience.
  • Experience or strong interest in working on systems and tools that increase other people's productivity.
  • Results-oriented with a bias toward flexibility and impact; willing to pick up tasks outside a narrow job description.
  • Enjoy pair programming and collaborative research-driven development.
  • Desire to learn more about machine learning research and care about societal impacts of AI.

Strong candidates may also have experience with:

  • High performance, large scale distributed systems.
  • Large scale LLM training and finetuning.
  • Implementing LLM finetuning algorithms such as RLHF.
  • Python (including diagnosing Python GIL issues).

Education:

  • At least a Bachelor's degree in a related field or equivalent experience is required.

Representative projects (examples)

  • Profiling the reinforcement learning pipeline to find and implement improvements.
  • Building automated test-environment training job launchers to detect pipeline regressions.
  • Adapting finetuning systems to support new model architectures.
  • Instrumenting training code to find Python GIL contention and eliminate it.
  • Diagnosing and fixing training slowdowns after many steps.
  • Implementing performant, stable training algorithm implementations from research proposals.

Benefits & Compensation

  • Annual salary (base): $300,000 - $405,000 USD.
  • Total compensation package may include equity, benefits, and incentive compensation.
  • Competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and office space for collaboration.

Logistics

  • Locations: San Francisco, CA; New York City, NY; Seattle, WA (United States).
  • Location-based hybrid policy: staff are expected to be in one of Anthropic’s offices at least 25% of the time; some roles may require more time in offices.
  • Visa sponsorship: Anthropic does sponsor visas in many cases and retains an immigration lawyer, though sponsorship is not guaranteed for every role/candidate.
  • Deadline to apply: None (applications reviewed on a rolling basis).

How to Apply / Other Notes

  • Anthropic encourages applicants who may not meet every listed qualification to apply.
  • The company emphasizes collaborative, high-impact research and values communication skills.
  • Guidance on candidate AI usage is provided via Anthropic’s candidate AI policy link.