Machine Learning Systems Engineer, Encodings And Tokenization

at Anthropic

📍 San Francisco, United States

USD 320,000-405,000 per year

MIDDLE

✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Python @ 2 Algorithms @ 3 Distributed Systems @ 3 Machine Learning @ 3 Communication @ 3 Performance Optimization @ 3 Debugging @ 3

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. This role on the Encodings and Tokenization team will build and optimize tokenization and encoding systems used across Pretraining and Finetuning workflows, enabling more efficient and effective model training while supporting research needs.

Responsibilities

Design, develop, and maintain tokenization systems used across Pretraining and Finetuning workflows
Optimize encoding techniques to improve model training efficiency and performance
Collaborate closely with research teams to understand evolving needs around data representation
Build infrastructure that enables researchers to experiment with novel tokenization approaches
Implement systems for monitoring and debugging tokenization-related issues in the model training pipeline
Create robust testing frameworks to validate tokenization systems across diverse languages and data types
Identify and address bottlenecks in data processing pipelines related to tokenization
Document systems thoroughly and communicate technical decisions clearly to stakeholders across teams

Requirements

Significant software engineering experience with demonstrated machine learning expertise
Experience with machine learning systems, data pipelines, or ML infrastructure
Proficiency in Python and familiarity with modern ML development practices
Strong analytical skills and ability to evaluate the impact of engineering changes on research outcomes
Experience working independently and collaborating with cross-functional teams in rapidly evolving research environments
Comfort navigating ambiguity and building solutions that support research progress
Education: At least a Bachelor's degree in a related field or equivalent experience

Strong candidates may also have experience with

Working with machine learning data processing pipelines
Building or optimizing data encodings for ML applications
Implementing or working with BPE, WordPiece, or other tokenization algorithms
Performance optimization of ML data processing systems
Multi-language tokenization challenges and solutions
Distributed systems and parallel computing for ML workflows
Large language models or other transformer-based architectures (not required)

Logistics & Other Details

Location: San Francisco, CA (Anthropic headquarters)
Location-based hybrid policy: staff expected to be in one of Anthropic's offices at least 25% of the time
Visa sponsorship: Anthropic does sponsor visas and retains immigration legal support, though sponsorship success varies by role/candidate
Deadline to apply: None (applications reviewed on a rolling basis)

Benefits

Competitive compensation and benefits
Optional equity donation matching
Generous vacation and parental leave
Flexible working hours and a collaborative office environment

Notes

The team values communication, collaboration (including pair programming), and attention to the societal impacts of AI work. Applicants are encouraged to apply even if they do not match every listed qualification.