Used Tools & Technologies
Not specified
Required Skills & Competences ?
Kubernetes @ 3 ETL @ 3 Machine Learning @ 6 Communication @ 6 PyTorch @ 3Details
Anthropic is building reliable, interpretable, and steerable AI systems. This role focuses on building large-scale ML systems from the ground up with attention to safety, steerability, and trustworthiness. As a Research Engineer / Research Scientist you will work across code and infrastructure: improving cluster reliability, increasing throughput and efficiency, designing and running scientific experiments, and improving developer tooling. You will write code informed by research context and collaborate closely with researchers and engineers.
Responsibilities
- Design, implement, and maintain large-scale ML infrastructure and tooling.
- Improve cluster reliability, throughput, and compute efficiency for large training jobs.
- Run and design scientific experiments and scale distributed training jobs (including training on thousands of GPUs).
- Implement and evaluate model and system optimizations (e.g., new attention mechanisms, Transformer variants).
- Create datasets and ETL pipelines in formats models can consume.
- Write design docs for fault tolerance and system reliability strategies.
- Build interactive visualizations (for example, visualizing attention between tokens).
- Collaborate closely with researchers, participate in pair programming, and contribute to dev tooling.
Requirements
- Significant software engineering experience (Bachelor's degree in a related field or equivalent experience is required).
- Comfortable working on high-performance, large-scale ML systems.
- Experience or strong interest in machine learning research and its societal impacts.
- Ability to write clear design docs and communicate effectively in a collaborative research environment.
- Willingness to pair program and pick up tasks outside a narrow job description.
Strongly preferred / Nice-to-have
- Experience with GPUs and scaling compute for ML workloads.
- Experience with Kubernetes.
- Experience with PyTorch.
- Familiarity with OS internals.
- Experience with language modeling and Transformers.
- Experience with reinforcement learning.
- Experience building large-scale ETL and data pipelines.
Representative projects
- Optimizing throughput of a new attention mechanism.
- Comparing compute efficiency of two Transformer variants.
- Creating a Wikipedia dataset in a format models can easily consume.
- Scaling a distributed training job to thousands of GPUs.
- Writing design docs for fault tolerance strategies.
- Creating interactive visualizations of attention between tokens in a language model.
Logistics
- Annual base compensation (reported): $340,000 - $425,000 USD. Total compensation includes equity, benefits, and may include incentive compensation.
- Education: At least a Bachelor's degree in a related field or equivalent experience.
- Location-based hybrid policy: staff expected to be in one of the offices at least 25% of the time (some roles may require more).
- Visa sponsorship: Anthropic does sponsor visas and retains an immigration lawyer; sponsorship is not guaranteed for every role/candidate but reasonable efforts will be made if an offer is extended.
Benefits & culture
- Competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and collaborative office space.
- Emphasis on high-impact, large-scale research and strong communication across teams.
- Encouragement to apply even if you do not meet every qualification listed.