Used Tools & Technologies
Not specified
Required Skills & Competences ?
Python @ 6 Machine Learning @ 3 TensorFlow @ 5 PyTorch @ 5Details
Anthropic’s mission is to create reliable, interpretable, and steerable AI systems that are safe and beneficial for users and society. The Safeguards Research Team focuses on critical safety research and engineering to ensure AI systems can be deployed safely, enabling responsible scaling of state-of-the-art models.
Responsibilities
- Design and implement ML pipelines for training and evaluating safety classifiers and detection models
- Develop systems to fine-tune language models for specific safety evaluation tasks
- Build infrastructure for hyperparameter optimization and model selection for safety experiments
- Create efficient data processing pipelines to manage large-scale model outputs and training datasets
- Develop tooling to automate generation, analysis, and classification of jailbreak attempts
- Build evaluation frameworks to systematically test model behaviors across safety dimensions
- Create flexible interfaces for researchers to experiment with varying model architectures and training configurations
Requirements
- Hands-on experience training and fine-tuning basic ML models
- Understanding of fundamental ML concepts such as overfitting and regularization
- Practical experience improving and evaluating ML models
- Proficiency with ML frameworks like PyTorch, TensorFlow, or JAX, including custom training loops
- Strong software engineering skills, especially in Python
- Experience building scalable data pipelines and ML infrastructure
- Familiarity with prompting and working with large language models
- Preference for simple, reliable ML engineering solutions
- Comfortable working in a fast-paced, collaborative research environment
- Commitment to understanding and addressing AI impacts
Strong candidates may also have:
- Implemented custom loss functions and evaluation metrics
- Experience with experiment and evaluation tracking tools
- Built integrated training, evaluation, and deployment pipelines
- Contributed to open-source machine learning or AI safety tools
Benefits
- Competitive compensation including annual salary range $315,000 - $340,000 USD
- Hybrid work policy with expected office presence at least 25% of the time
- Visa sponsorship support with immigration lawyer assistance
- Generous vacation and parental leave
- Flexible working hours
- Collaborative and mission-driven team environment with frequent research discussions
Education
- At least a Bachelor's degree in a related field or equivalent experience
Location
- San Francisco, CA
About Anthropic
Anthropic is a public benefit corporation focused on building trustworthy AI systems, fostering diversity and inclusion, and placing high impact and ethical considerations at the core of AI research.