Used Tools & Technologies
Not specified
Required Skills & Competences ?
Kubernetes @ 3 Machine Learning @ 3 Communication @ 3 NLP @ 3 LLM @ 3Details
Anthropic’s mission is to create reliable, interpretable, and steerable AI systems safe and beneficial for users and society. The Safeguards Research Team conducts critical safety research and engineering to ensure AI systems can be deployed safely. The role focuses on risks from advanced AI systems (ASL-3 and beyond), with projects including jailbreak robustness, automated red-teaming, monitoring techniques, and applied threat modeling.
Responsibilities
- Test robustness of safety techniques by training language models to subvert them.
- Run multi-agent reinforcement learning experiments such as AI Debate.
- Build tooling to evaluate LLM-generated jailbreak effectiveness.
- Write scripts and prompts for evaluating models' reasoning in safety contexts.
- Contribute to research papers, blog posts, and talks.
- Run experiments informing AI safety efforts and policies.
Requirements
- Significant software, machine learning, or research engineering experience.
- Some experience contributing to empirical AI research projects.
- Familiarity with technical AI safety research.
- Collaborative mindset and willingness to address tasks beyond strict job description.
- Care about AI impacts.
Strong candidates may also have
- Experience authoring ML, NLP, or AI safety research papers.
- Experience with large language models (LLMs).
- Experience with reinforcement learning.
- Experience with Kubernetes clusters and complex shared codebases.
Benefits
- Competitive compensation and benefits.
- Optional equity donation matching.
- Generous vacation and parental leave.
- Flexible working hours.
- Office space in San Francisco.
- Visa sponsorship available with immigration support.
- Hybrid work policy requiring presence in office at least 25% of time.
Additional Details
- Requires at least a Bachelor's degree or equivalent experience.
- Preference for candidates able to be based in the Bay Area or travel there ~25%.
About Anthropic
Anthropic values big science and high impact AI research, emphasizing collaborative and empirical science with frequent research discussions. The team pursues trustworthy and steerable AI while valuing communication skills and diverse perspectives.