Machine Learning Systems Engineer, Encodings and Tokenization

USD 320,000-405,000 per year
MIDDLE
✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Python @ 2 Algorithms @ 3 Distributed Systems @ 3 Machine Learning @ 3 Performance Optimization @ 3 Debugging @ 3

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems that are safe and beneficial for users and society. The team includes researchers, engineers, policy experts, and business leaders focused on building beneficial AI systems.

Responsibilities

  • Design, develop, and maintain tokenization systems across Pretraining and Finetuning workflows
  • Optimize encoding techniques to improve model training efficiency and performance
  • Collaborate with research teams to understand evolving data representation needs
  • Build infrastructure enabling researchers to experiment with novel tokenization approaches
  • Implement monitoring and debugging systems for tokenization-related issues in training pipelines
  • Create robust testing frameworks to validate tokenization systems across diverse languages and data types
  • Identify and address bottlenecks in data processing pipelines related to tokenization
  • Document systems and communicate technical decisions clearly to stakeholders

Requirements

  • Significant software engineering experience with demonstrated machine learning expertise
  • Comfortable working independently and collaboratively in rapidly evolving research environments
  • Results-oriented with flexibility and impact focus
  • Experience with machine learning systems, data pipelines, or ML infrastructure
  • Proficient in Python and familiar with modern ML development practices
  • Strong analytical skills to evaluate engineering changes on research outcomes
  • Willingness to take on tasks outside assigned description
  • Enjoy pair programming
  • Commitment to responsible AI development and awareness of societal impacts

Strong Candidates May Also Have Experience With

  • Machine learning data processing pipelines
  • Building or optimizing data encodings for ML applications
  • Implementing or working with BPE, WordPiece, or other tokenization algorithms
  • Performance optimization of ML data processing systems
  • Multi-language tokenization challenges and solutions
  • Research environments linking engineering and scientific progress
  • Distributed systems and parallel computing for ML workflows
  • Large language models or transformer-based architectures (not required)

Benefits and Logistics

  • Competitive salary between $320,000 and $405,000 USD annually
  • Location-based hybrid office policy with at least 25% in-office time (San Francisco, CA)
  • Visa sponsorship possible
  • Bachelor’s degree or equivalent experience required
  • Inclusive, collaborative team environment focused on impactful AI research
  • Flexible working hours, generous vacation and parental leave, equity matching, and nice office space

How We're Different

  • Focused on large-scale, high-impact AI research
  • Collaborative and communicative culture with frequent research discussions
  • Research directions include GPT-3, interpretability, multimodal neurons, scaling laws, AI safety, and learning from human preferences

Application

Applications are reviewed on a rolling basis; candidates encouraged to apply even if not meeting every qualification.