Engineering Manager, Agent Prompts & Evals

USD 320,000-405,000 per year
MIDDLE
✅ Hybrid
✅ Visa Sponsorship

Used Tools & Technologies

Machine Learning

Required Skills & Competences

A/B Testing @ 3 CI/CD @ 3 Communication @ 3 API @ 3 Experimentation @ 3 LLM @ 3 AI @ 3 Prompt Engineering @ 3

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The company is building eval frameworks, system prompt pipelines, and regression-detection systems used to measure and ship model and prompt changes with confidence.

About the role

Anthropic is looking for an Engineering Manager to lead the Agent Prompts & Evals team. This team owns the infrastructure that lets Anthropic ship model and prompt changes with confidence — the eval frameworks, system prompt pipelines, and regression-detection systems that every model launch depends on. The team operates at the seam between product engineering and research, partnering with other eval groups, product teams, TPMs, and research PMs. The role combines platform ownership, hands-on partnership during model launches, and collaboration across teams.

Responsibilities

  • Lead and grow a team of prompt engineers and platform software engineers
  • Own the product-side eval platform: frameworks, dashboards, bulk runners, and CI integrations used to measure model behavior and catch regressions
  • Own system prompt infrastructure: versioning, deployment, rollback, and review tooling for prompts running in production across claude.ai, the API, and agentic surfaces
  • Be a steady hand through model launches; act as the operational backstop during high-stakes launch periods
  • Build durable collaboration with other evals groups: define ownership boundaries, shared roadmaps, and shared infrastructure practices
  • Recruit, close, and retain engineers who work at the intersection of product engineering and model behavior
  • Shape the team’s investment priorities (frontier eval development, model launch automation, deeper prompt engineering support)
  • Push toward measuring hard-to-measure properties: behavioral drift, prompt quality, harness parity, not just easy metrics

Requirements

  • 8+ years in software engineering with 3+ years managing engineering teams, including experience leading a platform, infra, or developer-tooling team whose customers were other engineers
  • Track record of building tooling and processes that make it easy for other teams to do the right thing
  • Comfort managing a team with a mixed charter: platform ownership, service-to-other-teams, and launch-driven operational rhythm
  • Technical depth to engage on system design, review pipeline architecture, and be credible in technical debates; ability to read and review code and occasionally build
  • Product mindset and willingness to wear multiple hats
  • Demonstrated ability to build and maintain peer relationships with partner orgs: negotiate ownership, align roadmaps, and hold ground without being territorial
  • Experience recruiting and closing senior ICs in a competitive market

Strong candidates may also have

  • Prior exposure to LLM evals, ML experimentation platforms, or model quality work
  • Experience with A/B testing infrastructure, feature flagging, or gradual rollout systems
  • Background in devtools, CI/CD platforms, or testing infrastructure at scale
  • Experience managing teams that sit between larger orgs and turning that position into an asset
  • Interest in AI safety and alignment

Compensation

Annual Salary: $320,000 - $405,000 USD

Logistics

  • Education: At least a Bachelor's degree in a related field or equivalent experience
  • Location-based hybrid policy: staff are expected to be in one of Anthropic’s offices at least 25% of the time
  • Visa sponsorship: Anthropic states they do sponsor visas and retain an immigration lawyer to assist

How we work / Culture

Anthropic emphasizes collaborative, large-scale research efforts and values communication skills. The company highlights impact-focused research and frequent research discussions to align on high-impact directions.

Additional notes

Candidates are encouraged to apply even if they do not meet every qualification. Anthropic provides guidance on candidate AI usage in the application process and warns about recruitment scams (legitimate contacts come from @anthropic.com).