Research Engineer, Machine Learning (Reinforcement Learning)

at Anthropic

📍 London, United Kingdom

GBP 0 per year

MIDDLE

✅ Hybrid

✅ Visa Sponsorship

Used Tools & Technologies

Not specified

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Kubernetes @ 3 Automated Testing @ 3 Python @ 5 Distributed Systems @ 3 Machine Learning @ 3 TensorFlow @ 3 Communication @ 6 Rust @ 3 Debugging @ 3 API @ 3 LLM @ 3 PyTorch @ 3 GPU @ 3 AI @ 3 Reinforcement Learning @ 3 Profiling @ 3

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The Reinforcement Learning teams lead Anthropic's reinforcement learning research and development, contributing to Claude models and advancing capabilities such as autonomy, coding, tool use, and reasoning. This role blends research and engineering responsibilities to advance the capabilities and safety of large language models.

Responsibilities

Collaborate with researchers and engineers to implement and evaluate novel reinforcement learning approaches for large language models.
Architect and optimize core reinforcement learning infrastructure, including clean training abstractions and distributed experiment management across GPU clusters.
Design, implement, and test novel training environments, evaluations, and methodologies for reinforcement learning agents.
Drive performance improvements through profiling, optimization, and benchmarking; implement efficient caching and debug distributed systems to accelerate training and evaluation workflows.
Build prototypes for internal use, productivity, and evaluation, and collaborate across teams to develop automated testing frameworks and clean APIs.

Representative projects

Architect and optimize RL infrastructure from training abstractions to distributed experiment management across GPU clusters.
Design and implement novel training environments and evaluations for RL agents to push model capabilities.
Improve stack performance via profiling, optimization, caching, and debugging distributed systems.
Collaborate to develop automated testing frameworks, APIs, and scalable infrastructure to accelerate research.

Requirements

Proficiency in Python and async/concurrent programming (experience with frameworks like Trio).
Experience with machine learning frameworks such as PyTorch, TensorFlow, or JAX.
Industry experience in machine learning research; ability to balance research exploration with engineering implementation.
Strong systems design and communication skills; care about code quality, testing, and performance.
Experience or interest in reinforcement learning techniques and environments, LLM architectures and training methodologies, virtualization and sandboxed code execution environments, Kubernetes, distributed systems, or high-performance computing are valuable.
Enjoy pair programming and collaborating closely with research and engineering teams.
Education: at least a Bachelor's degree in a related field or equivalent experience (required).

Strong candidates may have

Familiarity with LLM architectures and training methodologies.
Experience with reinforcement learning techniques and environments.
Experience with virtualization and sandboxed code execution environments.
Experience with Kubernetes, distributed systems, or high-performance computing.
Experience with Rust and/or C++.

Benefits

Competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and office spaces for collaboration.

Logistics

Location: London, United Kingdom. Location-based hybrid policy: staff are expected to be in one of our offices at least 25% of the time.
Visa sponsorship: Anthropic states they do sponsor visas and retain an immigration lawyer to assist (not guaranteed for every role/candidate).
Deadline to apply: None. Applications are reviewed on a rolling basis.
Annual Salary: £1 - £1 GBP

How to apply

Apply via the Greenhouse job page. Applicants should provide a resume or LinkedIn profile. Anthropic encourages candidates from diverse backgrounds to apply and provides guidance on acceptable AI usage during the application process.