Staff Software Engineer, Inference

at Anthropic

📍 London, United Kingdom

GBP 325,000-390,000 per year

SENIOR

✅ Hybrid

✅ Visa Sponsorship

Used Tools & Technologies

Not specified

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Kubernetes @ 4 Python @ 4 GCP @ 4 Algorithms @ 4 Distributed Systems @ 3 Machine Learning @ 4 AWS @ 4 Communication @ 4 Performance Optimization @ 3 Rust @ 4 LLM @ 3 Observability @ 4 AI @ 4

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The team is a quickly growing group of researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.

Role overview

The Inference team builds and maintains the critical systems that serve Claude to millions of users worldwide. The team operates the full stack from intelligent request routing to fleet-wide orchestration across diverse AI accelerators and multiple cloud platforms. The dual mandate is to maximize compute efficiency for customer growth and enable research by providing high-performance inference infrastructure for scientists.

As a Staff Software Engineer on the Inference team, you will work end-to-end to identify and address infrastructure blockers to serve Claude at scale while enabling research. Strong candidates have familiarity with performance optimization, distributed systems, large-scale service orchestration, and intelligent request routing. Familiarity with LLM inference optimization, batching strategies, and multi-accelerator deployments is highly encouraged but not strictly required.

Responsibilities

Build and maintain large-scale, compute-agnostic inference deployments that serve models in production
Design and implement intelligent routing algorithms to optimize request distribution across thousands of accelerators
Implement autoscaling of compute fleets to match supply with demand across production, research, and experimental workloads
Build production-grade deployment pipelines for releasing new models to millions of users
Integrate new AI accelerator platforms and support multi-accelerator deployments to maintain hardware-agnostic capabilities
Contribute to inference features such as structured sampling and prompt caching
Analyze observability data to tune performance based on real-world production workloads
Manage multi-region deployments and geographic routing for global customers

Requirements

Significant software engineering experience, particularly with distributed systems
Familiarity with performance optimization and large-scale service orchestration
Experience or familiarity with load balancing, request routing, or traffic management systems
Familiarity or experience with LLM inference optimization, batching, and caching strategies is highly encouraged
Experience with Kubernetes and cloud infrastructure (AWS, GCP)
Experience with Python or Rust
Results-oriented, flexible, and able to work across role boundaries
Interest in learning more about machine learning systems and infrastructure and care about societal impacts of AI
Minimum of a Bachelor's degree in a related field or equivalent experience

Representative projects (examples)

Designing intelligent routing algorithms for thousands of accelerators
Autoscaling compute fleets for production and research workloads
Building deployment pipelines for new model releases
Integrating new AI accelerator platforms
Contributing inference features (e.g., structured sampling, prompt caching)
Tuning performance via observability data
Managing multi-region deployments and geographic routing

Logistics

Location: London, United Kingdom (location-based hybrid policy: staff expected in office at least 25% of the time)
Education: At least a Bachelor's degree in a related field or equivalent experience
Visa sponsorship: The company sponsors visas and retains an immigration lawyer to help with sponsorship where reasonable

Compensation

Annual salary range: £325,000 - £390,000 GBP

How we work / Culture

Collaborative, research-driven environment focused on large-scale, high-impact AI research
Emphasis on communication and cross-functional collaboration
Encouragement to apply even if you do not meet every listed qualification