Used Tools & Technologies
Not specified
Required Skills & Competences ?
Kubernetes @ 3 Automated Testing @ 3 Python @ 5 Distributed Systems @ 3 TensorFlow @ 3 Communication @ 6 Rust @ 3 Debugging @ 3 API @ 3 LLM @ 3 PyTorch @ 3Details
Anthropic’s Horizons team conducts reinforcement learning research and development to advance capabilities and safety of large language models. This role blends research and engineering: you will implement novel RL approaches, design and iterate on model architectures, build scalable RL infrastructure, and develop prototypes for agentic models and evaluations.
Responsibilities
- Architect and optimize core reinforcement learning infrastructure, including training abstractions and distributed experiment management across clusters.
- Design, implement, and test novel model architectures, training environments, evaluations, and methodologies for reinforcement learning agents.
- Drive performance improvements through profiling, optimization, benchmarking, caching solutions, and debugging distributed systems to accelerate training and evaluation.
- Collaborate with research and engineering teams to develop automated testing frameworks, clean APIs, and scalable infrastructure to support AI research and production transitions.
- Create prototypes for internal use, productivity, and evaluation; work on improving model reasoning and tool use for open-ended tasks.
Requirements
- Proficiency in Python.
- Experience with both JAX and PyTorch.
- Experience designing, implementing, and iterating on model architecture improvements.
- Industry experience training and conducting ML research on production-scale LLMs.
- Ability to balance research exploration with engineering implementation; care about code quality, testing, and performance.
- Strong systems design and communication skills; comfortable pair programming and collaborating closely with cross-functional teams.
- Commitment to building safe and beneficial AI systems.
Strong candidates may have
- Experience with continuous learning / parameter-efficient fine-tuning approaches.
- Experience with TensorFlow.
- Experience with long-range LLM agent designs and reinforcement learning techniques/environments.
- Experience with virtualization and sandboxed code execution environments.
- Experience with Kubernetes and async frameworks such as trio.
- Experience with distributed systems or high-performance computing.
- Experience with Rust and/or C++.
- Research experience and publication history.
Logistics
- Education: At least a Bachelor's degree in a related field or equivalent experience is required.
- Location & office policy: Location-based hybrid policy; staff are expected to be in one of Anthropic's offices at least ~25% of the time.
- Visa sponsorship: Anthropic does sponsor visas for roles where feasible and retains immigration counsel to assist.
- Deadline: None — applications reviewed on a rolling basis.
Compensation
- Annual salary range: $300,000 - $405,000 USD (as stated in the posting).
Benefits & Culture
- Competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and collaborative office space in San Francisco.
- Emphasis on large-scale, high-impact research, frequent research discussions, and strong cross-team collaboration. Applicants are encouraged to apply even if they do not meet every qualification.