Used Tools & Technologies
Not specified
Required Skills & Competences ?
Python @ 2 Algorithms @ 3 Distributed Systems @ 3 Machine Learning @ 3 Communication @ 3 Performance Optimization @ 3 Debugging @ 3Details
About Anthropic
Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.
Role overview
We are seeking an experienced Machine Learning Systems Engineer to join our Encodings and Tokenization team. This cross-functional role will be instrumental in developing and optimizing the encodings and tokenization systems used throughout our finetuning workflows. As a bridge between Pretraining and Finetuning teams, you will build infrastructure that directly impacts how models learn from and interpret data, enabling more efficient and effective training while helping ensure models remain reliable, interpretable, and steerable.
Responsibilities
- Design, develop, and maintain tokenization systems used across Pretraining and Finetuning workflows
- Optimize encoding techniques to improve model training efficiency and performance
- Collaborate closely with research teams to understand evolving needs around data representation
- Build infrastructure that enables researchers to experiment with novel tokenization approaches
- Implement systems for monitoring and debugging tokenization-related issues in the model training pipeline
- Create robust testing frameworks to validate tokenization systems across diverse languages and data types
- Identify and address bottlenecks in data processing pipelines related to tokenization
- Document systems thoroughly and communicate technical decisions clearly to stakeholders across teams
Requirements
- Significant software engineering experience with demonstrated machine learning expertise
- Proficiency in Python and familiarity with modern ML development practices
- Experience with machine learning systems, data pipelines, or ML infrastructure
- Strong analytical skills and ability to evaluate the impact of engineering changes on research outcomes
- Ability to work independently and collaboratively in rapidly evolving research environments
- Bachelor's degree in a related field or equivalent experience (required)
- Expectation to be in one of Anthropic's offices at least ~25% of the time (location-based hybrid policy)
Strong Candidates May Also Have Experience With
- Working with machine learning data processing pipelines
- Building or optimizing data encodings for ML applications
- Implementing or working with BPE, WordPiece, or other tokenization algorithms
- Performance optimization of ML data processing systems
- Multi-language tokenization challenges and solutions
- Distributed systems and parallel computing for ML workflows
- Large language models or other transformer-based architectures (not required)
Compensation & Logistics
Annual base salary range: $320,000 - $405,000 USD. Total compensation includes equity, benefits, and may include incentive compensation. Visa sponsorship is available in some cases. Applications are reviewed on a rolling basis.
Benefits & Culture
Anthropic is a public benefit corporation headquartered in San Francisco. They offer competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and collaborative office spaces. The company values communication, collaboration, and impact-driven research. They encourage applications from candidates who may not meet every listed qualification.