Research Engineer, Agents

at Anthropic

📍 New York City, United States
📍 San Francisco, United States
📍 Seattle, United States

USD 315,000-425,000 per year

SENIOR

✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Machine Learning @ 7 Communication @ 4 LLM @ 3

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems aimed at being safe and beneficial for users and society. The team consists of researchers, engineers, policy experts, and business leaders working to build beneficial AI systems.

Responsibilities

Finetune new capabilities into Claude that maximize its performance or ease of use on agentic tasks
Develop and compare performance of different tools for agents (e.g., memory, context compression, communication architectures)
Systematically discover and test prompt engineering best practices for agents
Develop automated techniques for designing and evaluating agentic systems
Assist with automated evaluation of Claude models and prompts across the training and product lifecycle
Collaborate with the product organization to solve challenges applying agents to products
Create and optimize data mixes for model training
Maintain infrastructure for efficient prompt iteration and testing

Requirements

7+ years of machine learning and software engineering experience
High-level familiarity with large language model (LLM) architecture and operation
Extensive experience exploring and testing language model behavior
Experience with prompting and/or building products using language models
Strong communication skills and interest in difficult research collaboration
Passion for making powerful technology safe and socially beneficial
Keeps updated on emerging research and industry trends
Enjoys pair programming

Preferred Experience

Developing complex agentic systems using LLMs
Large-scale reinforcement learning on language models
Multi-agent systems

Representative Projects

Implementing/testing novel architecture for retrieval, tool use, sub-agents, or memory for Claude
Finetuning Claude to maximize performance using agent tools like read-write memory or inter-agent communication
Building prompting and orchestration for production LLM applications
Creating automatic prompt optimizers or LLM-driven evaluation systems
Building scaled model evaluation frameworks using model-based evaluation techniques

Logistics

Education: Bachelor's degree in a related field or equivalent experience
Hybrid work policy: Expected to be in office at least 25% of the time; some roles may require more
Visa sponsorship available with reasonable effort

Benefits

Competitive compensation and benefits
Optional equity donation matching
Generous vacation and parental leave
Flexible working hours
Collaborative office space in San Francisco

Company Culture

Focus on big science AI research for high impact
Collaborative and cohesive team
Frequent research discussions valuing communication skills
Research directions aligned with major AI advancements and safety

AI Usage Guidance

Company's policy on AI usage during application process is provided.