Machine Learning Systems Engineer, Encodings And Tokenization

USD 320,000-405,000 per year
MIDDLE
✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Python @ 2 Algorithms @ 3 Distributed Systems @ 3 Machine Learning @ 3 Communication @ 3 Performance Optimization @ 3 Debugging @ 3

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. This role on the Encodings and Tokenization team will build and optimize tokenization and encoding systems used across Pretraining and Finetuning workflows, enabling more efficient and effective model training while supporting research needs.

Responsibilities

  • Design, develop, and maintain tokenization systems used across Pretraining and Finetuning workflows
  • Optimize encoding techniques to improve model training efficiency and performance
  • Collaborate closely with research teams to understand evolving needs around data representation
  • Build infrastructure that enables researchers to experiment with novel tokenization approaches
  • Implement systems for monitoring and debugging tokenization-related issues in the model training pipeline
  • Create robust testing frameworks to validate tokenization systems across diverse languages and data types
  • Identify and address bottlenecks in data processing pipelines related to tokenization
  • Document systems thoroughly and communicate technical decisions clearly to stakeholders across teams

Requirements

  • Significant software engineering experience with demonstrated machine learning expertise
  • Experience with machine learning systems, data pipelines, or ML infrastructure
  • Proficiency in Python and familiarity with modern ML development practices
  • Strong analytical skills and ability to evaluate the impact of engineering changes on research outcomes
  • Experience working independently and collaborating with cross-functional teams in rapidly evolving research environments
  • Comfort navigating ambiguity and building solutions that support research progress
  • Education: At least a Bachelor's degree in a related field or equivalent experience

Strong candidates may also have experience with

  • Working with machine learning data processing pipelines
  • Building or optimizing data encodings for ML applications
  • Implementing or working with BPE, WordPiece, or other tokenization algorithms
  • Performance optimization of ML data processing systems
  • Multi-language tokenization challenges and solutions
  • Distributed systems and parallel computing for ML workflows
  • Large language models or other transformer-based architectures (not required)

Logistics & Other Details

  • Location: San Francisco, CA (Anthropic headquarters)
  • Location-based hybrid policy: staff expected to be in one of Anthropic's offices at least 25% of the time
  • Visa sponsorship: Anthropic does sponsor visas and retains immigration legal support, though sponsorship success varies by role/candidate
  • Deadline to apply: None (applications reviewed on a rolling basis)

Benefits

  • Competitive compensation and benefits
  • Optional equity donation matching
  • Generous vacation and parental leave
  • Flexible working hours and a collaborative office environment

Notes

  • The team values communication, collaboration (including pair programming), and attention to the societal impacts of AI work. Applicants are encouraged to apply even if they do not match every listed qualification.