Research Engineer, Machine Learning (Reinforcement Learning)

GBP 0 per year
MIDDLE
✅ Hybrid
✅ Visa Sponsorship

Used Tools & Technologies

Not specified

Required Skills & Competences

Kubernetes @ 3 Automated Testing @ 3 Python @ 5 Distributed Systems @ 3 Machine Learning @ 3 TensorFlow @ 3 Communication @ 6 Rust @ 3 Debugging @ 3 API @ 3 LLM @ 3 PyTorch @ 3 GPU @ 3 AI @ 3 Reinforcement Learning @ 3 Profiling @ 3

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The Reinforcement Learning teams lead Anthropic's reinforcement learning research and development, contributing to Claude models and advancing capabilities such as autonomy, coding, tool use, and reasoning. This role blends research and engineering responsibilities to advance the capabilities and safety of large language models.

Responsibilities

  • Collaborate with researchers and engineers to implement and evaluate novel reinforcement learning approaches for large language models.
  • Architect and optimize core reinforcement learning infrastructure, including clean training abstractions and distributed experiment management across GPU clusters.
  • Design, implement, and test novel training environments, evaluations, and methodologies for reinforcement learning agents.
  • Drive performance improvements through profiling, optimization, and benchmarking; implement efficient caching and debug distributed systems to accelerate training and evaluation workflows.
  • Build prototypes for internal use, productivity, and evaluation, and collaborate across teams to develop automated testing frameworks and clean APIs.

Representative projects

  • Architect and optimize RL infrastructure from training abstractions to distributed experiment management across GPU clusters.
  • Design and implement novel training environments and evaluations for RL agents to push model capabilities.
  • Improve stack performance via profiling, optimization, caching, and debugging distributed systems.
  • Collaborate to develop automated testing frameworks, APIs, and scalable infrastructure to accelerate research.

Requirements

  • Proficiency in Python and async/concurrent programming (experience with frameworks like Trio).
  • Experience with machine learning frameworks such as PyTorch, TensorFlow, or JAX.
  • Industry experience in machine learning research; ability to balance research exploration with engineering implementation.
  • Strong systems design and communication skills; care about code quality, testing, and performance.
  • Experience or interest in reinforcement learning techniques and environments, LLM architectures and training methodologies, virtualization and sandboxed code execution environments, Kubernetes, distributed systems, or high-performance computing are valuable.
  • Enjoy pair programming and collaborating closely with research and engineering teams.
  • Education: at least a Bachelor's degree in a related field or equivalent experience (required).

Strong candidates may have

  • Familiarity with LLM architectures and training methodologies.
  • Experience with reinforcement learning techniques and environments.
  • Experience with virtualization and sandboxed code execution environments.
  • Experience with Kubernetes, distributed systems, or high-performance computing.
  • Experience with Rust and/or C++.

Benefits

  • Competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and office spaces for collaboration.

Logistics

  • Location: London, United Kingdom. Location-based hybrid policy: staff are expected to be in one of our offices at least 25% of the time.
  • Visa sponsorship: Anthropic states they do sponsor visas and retain an immigration lawyer to assist (not guaranteed for every role/candidate).
  • Deadline to apply: None. Applications are reviewed on a rolling basis.
  • Annual Salary: £1 - £1 GBP

How to apply

  • Apply via the Greenhouse job page. Applicants should provide a resume or LinkedIn profile. Anthropic encourages candidates from diverse backgrounds to apply and provides guidance on acceptable AI usage during the application process.