Research Scientist, Agentic Learning (Horizons)

at Anthropic

📍 San Francisco, United States

USD 300,000-405,000 per year

MIDDLE

✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Kubernetes @ 3 Automated Testing @ 3 Python @ 5 Distributed Systems @ 3 TensorFlow @ 3 Communication @ 6 Rust @ 3 Debugging @ 3 API @ 3 LLM @ 3 PyTorch @ 3

Details

Anthropic’s Horizons team conducts reinforcement learning research and development to advance capabilities and safety of large language models. This role blends research and engineering: you will implement novel RL approaches, design and iterate on model architectures, build scalable RL infrastructure, and develop prototypes for agentic models and evaluations.

Responsibilities

Architect and optimize core reinforcement learning infrastructure, including training abstractions and distributed experiment management across clusters.
Design, implement, and test novel model architectures, training environments, evaluations, and methodologies for reinforcement learning agents.
Drive performance improvements through profiling, optimization, benchmarking, caching solutions, and debugging distributed systems to accelerate training and evaluation.
Collaborate with research and engineering teams to develop automated testing frameworks, clean APIs, and scalable infrastructure to support AI research and production transitions.
Create prototypes for internal use, productivity, and evaluation; work on improving model reasoning and tool use for open-ended tasks.

Requirements

Proficiency in Python.
Experience with both JAX and PyTorch.
Experience designing, implementing, and iterating on model architecture improvements.
Industry experience training and conducting ML research on production-scale LLMs.
Ability to balance research exploration with engineering implementation; care about code quality, testing, and performance.
Strong systems design and communication skills; comfortable pair programming and collaborating closely with cross-functional teams.
Commitment to building safe and beneficial AI systems.

Strong candidates may have

Experience with continuous learning / parameter-efficient fine-tuning approaches.
Experience with TensorFlow.
Experience with long-range LLM agent designs and reinforcement learning techniques/environments.
Experience with virtualization and sandboxed code execution environments.
Experience with Kubernetes and async frameworks such as trio.
Experience with distributed systems or high-performance computing.
Experience with Rust and/or C++.
Research experience and publication history.

Logistics

Education: At least a Bachelor's degree in a related field or equivalent experience is required.
Location & office policy: Location-based hybrid policy; staff are expected to be in one of Anthropic's offices at least ~25% of the time.
Visa sponsorship: Anthropic does sponsor visas for roles where feasible and retains immigration counsel to assist.
Deadline: None — applications reviewed on a rolling basis.

Compensation

Annual salary range: $300,000 - $405,000 USD (as stated in the posting).

Benefits & Culture

Competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and collaborative office space in San Francisco.
Emphasis on large-scale, high-impact research, frequent research discussions, and strong cross-team collaboration. Applicants are encouraged to apply even if they do not meet every qualification.