Prompt Engineer, Agent Prompts & Evals

at Anthropic

📍 New York City, United States
📍 San Francisco, United States

USD 320,000-405,000 per year

MIDDLE

✅ Hybrid

✅ Visa Sponsorship

Used Tools & Technologies

LLM

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Software Development @ 3 Python @ 5 A/B Testing @ 3 CI/CD @ 3 Machine Learning @ 3 Hiring @ 3 Communication @ 3 API @ 3 Experimentation @ 3 NLP @ 3 AI @ 3 Prompt Engineering @ 3

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The product engineering team is hiring prompt and context engineers to build AI-first products, features, and evaluations that bridge model capabilities and product experience. The role focuses on designing system and feature prompts, building evaluation suites, and collaborating with product, research, and safeguards teams to deliver consistent, safe, and beneficial user experiences across consumer and API products.

Responsibilities

Design, test, and optimize system prompts and feature-specific prompts that shape Claude’s behavior across consumer and API products.
Build and maintain comprehensive evaluation suites to ensure model quality and consistency across product launches and updates.
Partner closely with product teams, research teams, and safeguards to meet quality and safety standards for new features.
Support model launches by catching regressions and ensuring smooth rollouts.
Contribute to frameworks and tools that enable teams to develop and test prompts and features confidently (infrastructure contribution).
Mentor product engineers on prompt engineering best practices and help teams build their first evaluations.
Iterate rapidly in a fast-paced environment as model capabilities advance.

Requirements

Required Qualifications

5+ years of software engineering experience with Python or similar languages.
Demonstrated experience with large language models (LLMs) and prompt engineering (via work, research, or significant personal projects).
Strong understanding of evaluation methodologies and metrics for AI systems.
Excellent written and verbal communication skills to explain complex model behaviors to diverse stakeholders.
Ability to manage multiple concurrent projects and prioritize effectively.
Experience with version control, CI/CD, and modern software development practices.

Preferred Qualifications

Experience with Claude or other frontier AI models in production settings.
Background in machine learning, NLP, or related fields.
Experience with A/B testing and experimentation frameworks (e.g., Statsig).
Familiarity with AI safety and alignment considerations.
Experience building tools and infrastructure for ML/AI workflows.
Track record of improving AI system performance through systematic evaluation and iteration.

Benefits

Competitive compensation and benefits.
Optional equity donation matching.
Generous vacation and parental leave.
Flexible working hours and a collaborative office space.

Logistics

Annual salary range: $320,000 - $405,000 USD.
Education: Bachelor's degree in a related field or equivalent experience required.
Location-based hybrid policy: staff are expected to be in one of the offices at least 25% of the time.
Visa sponsorship: Anthropic states they do sponsor visas and retain an immigration lawyer to assist, though not all roles/candidates may be successfully sponsored.