Machine Learning Engineer, Safeguards Research

USD 315,000-340,000 per year
MIDDLE
✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Python @ 6 Machine Learning @ 3 TensorFlow @ 5 PyTorch @ 5

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems that are safe and beneficial for users and society. The Safeguards Research Team focuses on critical safety research and engineering to ensure AI systems can be deployed safely, enabling responsible scaling of state-of-the-art models.

Responsibilities

  • Design and implement ML pipelines for training and evaluating safety classifiers and detection models
  • Develop systems to fine-tune language models for specific safety evaluation tasks
  • Build infrastructure for hyperparameter optimization and model selection for safety experiments
  • Create efficient data processing pipelines to manage large-scale model outputs and training datasets
  • Develop tooling to automate generation, analysis, and classification of jailbreak attempts
  • Build evaluation frameworks to systematically test model behaviors across safety dimensions
  • Create flexible interfaces for researchers to experiment with varying model architectures and training configurations

Requirements

  • Hands-on experience training and fine-tuning basic ML models
  • Understanding of fundamental ML concepts such as overfitting and regularization
  • Practical experience improving and evaluating ML models
  • Proficiency with ML frameworks like PyTorch, TensorFlow, or JAX, including custom training loops
  • Strong software engineering skills, especially in Python
  • Experience building scalable data pipelines and ML infrastructure
  • Familiarity with prompting and working with large language models
  • Preference for simple, reliable ML engineering solutions
  • Comfortable working in a fast-paced, collaborative research environment
  • Commitment to understanding and addressing AI impacts

Strong candidates may also have:

  • Implemented custom loss functions and evaluation metrics
  • Experience with experiment and evaluation tracking tools
  • Built integrated training, evaluation, and deployment pipelines
  • Contributed to open-source machine learning or AI safety tools

Benefits

  • Competitive compensation including annual salary range $315,000 - $340,000 USD
  • Hybrid work policy with expected office presence at least 25% of the time
  • Visa sponsorship support with immigration lawyer assistance
  • Generous vacation and parental leave
  • Flexible working hours
  • Collaborative and mission-driven team environment with frequent research discussions

Education

  • At least a Bachelor's degree in a related field or equivalent experience

Location

  • San Francisco, CA

About Anthropic

Anthropic is a public benefit corporation focused on building trustworthy AI systems, fostering diversity and inclusion, and placing high impact and ethical considerations at the core of AI research.