Research Scientist, Agentic Learning (Horizons)

USD 300,000-405,000 per year
MIDDLE
✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Kubernetes @ 3 Automated Testing @ 3 Python @ 5 Distributed Systems @ 3 TensorFlow @ 3 Communication @ 6 Rust @ 3 Debugging @ 3 API @ 3 LLM @ 3 PyTorch @ 3

Details

Anthropic’s Horizons team conducts reinforcement learning research and development to advance capabilities and safety of large language models. This role blends research and engineering: you will implement novel RL approaches, design and iterate on model architectures, build scalable RL infrastructure, and develop prototypes for agentic models and evaluations.

Responsibilities

  • Architect and optimize core reinforcement learning infrastructure, including training abstractions and distributed experiment management across clusters.
  • Design, implement, and test novel model architectures, training environments, evaluations, and methodologies for reinforcement learning agents.
  • Drive performance improvements through profiling, optimization, benchmarking, caching solutions, and debugging distributed systems to accelerate training and evaluation.
  • Collaborate with research and engineering teams to develop automated testing frameworks, clean APIs, and scalable infrastructure to support AI research and production transitions.
  • Create prototypes for internal use, productivity, and evaluation; work on improving model reasoning and tool use for open-ended tasks.

Requirements

  • Proficiency in Python.
  • Experience with both JAX and PyTorch.
  • Experience designing, implementing, and iterating on model architecture improvements.
  • Industry experience training and conducting ML research on production-scale LLMs.
  • Ability to balance research exploration with engineering implementation; care about code quality, testing, and performance.
  • Strong systems design and communication skills; comfortable pair programming and collaborating closely with cross-functional teams.
  • Commitment to building safe and beneficial AI systems.

Strong candidates may have

  • Experience with continuous learning / parameter-efficient fine-tuning approaches.
  • Experience with TensorFlow.
  • Experience with long-range LLM agent designs and reinforcement learning techniques/environments.
  • Experience with virtualization and sandboxed code execution environments.
  • Experience with Kubernetes and async frameworks such as trio.
  • Experience with distributed systems or high-performance computing.
  • Experience with Rust and/or C++.
  • Research experience and publication history.

Logistics

  • Education: At least a Bachelor's degree in a related field or equivalent experience is required.
  • Location & office policy: Location-based hybrid policy; staff are expected to be in one of Anthropic's offices at least ~25% of the time.
  • Visa sponsorship: Anthropic does sponsor visas for roles where feasible and retains immigration counsel to assist.
  • Deadline: None — applications reviewed on a rolling basis.

Compensation

  • Annual salary range: $300,000 - $405,000 USD (as stated in the posting).

Benefits & Culture

  • Competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and collaborative office space in San Francisco.
  • Emphasis on large-scale, high-impact research, frequent research discussions, and strong cross-team collaboration. Applicants are encouraged to apply even if they do not meet every qualification.