Research Engineer / Scientist, Tool Use Safety

at Anthropic

📍 New York City, United States
📍 San Francisco, United States
📍 Seattle, United States

USD 315,000-425,000 per year

MIDDLE

✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Security @ 3 Python @ 3 Machine Learning @ 6 Communication @ 6 Mathematics @ 6 LLM @ 5

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The Tool Use Team within Research focuses on making Claude the world's most capable, safe, reliable, and efficient model for tool use and agentic applications. This role involves advancing safe tool use by researching and shipping solutions that address prompt injection robustness, preventing data exfiltration via tools, defending against adversarial attacks in multi-turn agent conversations, and ensuring safety for long-horizon autonomous agents with access to many tools. You'll collaborate across research and engineering teams and own the full research lifecycle from identifying limitations to implementing solutions in production models.

Note: All interviews for this role are conducted in Python.

Responsibilities

Design and implement novel, scalable reinforcement learning methodologies that advance tool use safety.
Define and pursue research agendas that push the boundaries of tool use safety.
Build rigorous, realistic evaluations capturing the complexity of real-world tool use safety challenges.
Ship research advances that directly impact and protect millions of users.
Collaborate with safety research (e.g., Safeguards, Alignment Science), capabilities research, and product teams to drive breakthroughs and ship solutions to production.
Design, implement, and debug code across research and production ML stacks.
Contribute to a collaborative research culture via pair programming, technical discussions, and team problem-solving.

Requirements

Strong machine learning research or applied-research experience, or a strong quantitative background (physics, mathematics, quantitative finance) equivalent to the role.
Ability to write clean, reliable code and solid software engineering skills.
Strong communication skills to explain complex ideas to diverse audiences.
Hunger to learn and grow; demonstrated initiative regardless of years of experience.
At least a Bachelor's degree in a related field or equivalent experience.

Strong candidates may also have

Experience with tool use/agentic safety, trust & safety, or security.
Experience with reinforcement learning techniques and environments.
Experience with language model training, fine-tuning, or evaluation.
Experience building AI agents or autonomous systems.
Published influential work in ML areas, especially LLM safety & alignment.
Deep expertise in specialized areas (e.g., RL, security, mathematical foundations).
Experience shipping features or working closely with product teams.
Enthusiasm for pair programming and collaborative research.

Logistics

Location-based hybrid policy: staff are expected to be in one of Anthropic's offices at least 25% of the time (some roles may require more office time).
Visa sponsorship: Anthropic does sponsor visas and will make reasonable efforts to secure visas for hires when possible.

Compensation

Annual salary range: $315,000 - $425,000 USD.

Benefits & Culture

Competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and collaborative office spaces.
Emphasis on high-impact, large-scale AI research, cross-team collaboration, and frequent research discussions.
Encouragement to apply even if applicants do not meet every listed qualification; Anthropic values diverse perspectives.

How we're different

Anthropic treats AI research as big science, focusing on a few large-scale research efforts with an emphasis on impact, interpretability, and steerability.
The team values communication, collaboration, and empirical approaches that draw on disciplines like physics and biology as well as computer science.

Application guidance

Guidance on candidates' AI usage during the application process is available via Anthropic's policy (link provided on the job page).