Machine Learning Systems Engineer, Research Tools

USD 320,000-405,000 per year
MIDDLE
✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Python @ 2 Algorithms @ 3 Distributed Systems @ 3 Machine Learning @ 3 Communication @ 6 Performance Optimization @ 3 Debugging @ 3

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The Encodings and Tokenization team is developing and optimizing encodings and tokenization systems used across Pretraining and Finetuning workflows. This role bridges Pretraining and Finetuning teams, building infrastructure that directly affects how models learn from and interpret data and enabling more efficient and effective model training while maintaining reliability and interpretability.

Responsibilities

  • Design, develop, and maintain tokenization systems used across Pretraining and Finetuning workflows
  • Optimize encoding techniques to improve model training efficiency and performance
  • Collaborate closely with research teams to understand evolving needs around data representation
  • Build infrastructure enabling researchers to experiment with novel tokenization approaches
  • Implement systems for monitoring and debugging tokenization-related issues in the model training pipeline
  • Create robust testing frameworks to validate tokenization systems across diverse languages and data types
  • Identify and address bottlenecks in data processing pipelines related to tokenization
  • Document systems thoroughly and communicate technical decisions clearly to stakeholders across teams

Requirements

  • Significant software engineering experience with demonstrated machine learning expertise
  • Proficiency in Python and familiarity with modern ML development practices
  • Experience with machine learning systems, data pipelines, or ML infrastructure
  • Strong analytical skills and the ability to evaluate impact of engineering changes on research outcomes
  • Comfortable navigating ambiguity and developing solutions in rapidly evolving research environments
  • Ability to work independently and collaboratively across cross-functional teams; strong communication and documentation skills
  • Bachelor's degree in a related field or equivalent experience

Strong Candidates May Also Have Experience With

  • Working with ML data processing pipelines and performance optimization of ML data processing systems
  • Building or optimizing data encodings for ML applications
  • Implementing or working with BPE, WordPiece, or other tokenization algorithms
  • Multi-language tokenization challenges and solutions
  • Distributed systems and parallel computing for ML workflows
  • Large language models or other transformer-based architectures (helpful but not required)
  • Research environments where engineering directly enables scientific progress

Compensation, Logistics, and Benefits

  • Annual base salary range: $320,000 - $405,000 USD
  • Total compensation package for full-time employees includes equity, benefits, and may include incentive compensation
  • Education requirement: at least a Bachelor's degree in a related field or equivalent experience
  • Location-based hybrid policy: staff are expected to be in one of Anthropic's offices at least 25% of the time (some roles may require more time in office)
  • Visa sponsorship: Anthropic will make reasonable efforts to sponsor visas for candidates when possible and retains an immigration lawyer to assist
  • Deadline to apply: None (applications reviewed on a rolling basis)

About Working Here

Anthropic is a public benefit corporation headquartered in San Francisco. The company emphasizes large-scale, collaborative AI research, impact-focused work, frequent research discussions, competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a collaborative office environment. The team values responsible AI development and diverse perspectives.