Machine Learning Systems Engineer, Safeguards Research

USD 315,000-340,000 per year
MIDDLE
✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Python @ 6 Machine Learning @ 3 TensorFlow @ 5 Communication @ 3 PyTorch @ 5

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The Safeguards Research Team (part of the Alignment Science team) conducts safety research and engineering to ensure advanced AI models can be deployed safely. As a Machine Learning Systems Engineer on this team, you will bridge research and engineering by developing robust end-to-end ML pipelines, scalable evaluation infrastructure, and tooling to analyze and mitigate risks in large language models. Your work will support safety evaluations, training/fine-tuning workflows, and automation for detecting and classifying jailbreak attempts.

Responsibilities

  • Design and implement ML pipelines for training and evaluating safety classifiers and detection models.
  • Build infrastructure for hyperparameter optimization and model selection across safety experiments.
  • Create flexible interfaces and dashboards for researchers to interact with models and experimental setups.
  • Create efficient data processing pipelines that handle large-scale model outputs and training datasets.
  • Develop tooling to automate the generation, analysis, and classification of jailbreak attempts.
  • Translate research ideas into production-quality ML systems and collaborate closely with researchers.

Requirements

  • Strong foundation in machine learning fundamentals (e.g., understanding of overfitting and regularization).
  • Practical experience improving and evaluating ML models and intuition about hyperparameter optimization.
  • Proficiency with ML frameworks (examples given: PyTorch, TensorFlow, JAX) and ability to implement custom training loops.
  • Strong software engineering skills, particularly with Python.
  • Experience building scalable data pipelines, interpretable dashboards, and ML infrastructure.
  • Experience with prompting and working with large language models; comfortable training and fine-tuning language models.
  • Preference for simple, reliable engineering solutions and ability to work in a fast-paced, collaborative research environment.
  • Care about the societal and ethical impacts of AI.
  • Education: at least a Bachelor's degree in a related field or equivalent experience.

Strong candidates may also have:

  • Experience building systems that integrate with large language models.
  • Experience with distributed computing systems or parallel processing.
  • Experience implementing data processing pipelines at scale.
  • Contributions to open-source machine learning or AI safety tools.
  • Experience with cloud infrastructure and containerization.

Logistics

  • Location: San Francisco, CA (office presence expected).
  • Location-based hybrid policy: staff are expected to be in one of Anthropic's offices at least 25% of the time; some roles may require more time in-office.
  • Visa sponsorship: Anthropic does sponsor visas and will make reasonable efforts to support successful candidates (not all roles/candidates guaranteed).
  • Encouragement to apply even if you do not meet every qualification.

Benefits

  • Competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a collaborative office environment.
  • Opportunity to work on high-impact AI safety research and infrastructure that directly enables safe deployment of advanced models.

How we're different

  • Anthropic emphasizes large-scale, high-impact AI research and close collaboration across research and engineering teams. The team values communication, empirical research practices, and long-term impact in AI safety.