Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Kubernetes @ 4
Python @ 4
GCP @ 4
Algorithms @ 4
Distributed Systems @ 3
Machine Learning @ 4
AWS @ 4
Communication @ 4
Performance Optimization @ 3
Rust @ 4
LLM @ 3
Observability @ 4
AI @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The team is a quickly growing group of researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.
Role overview
The Inference team builds and maintains the critical systems that serve Claude to millions of users worldwide. The team operates the full stack from intelligent request routing to fleet-wide orchestration across diverse AI accelerators and multiple cloud platforms. The dual mandate is to maximize compute efficiency for customer growth and enable research by providing high-performance inference infrastructure for scientists.
As a Staff Software Engineer on the Inference team, you will work end-to-end to identify and address infrastructure blockers to serve Claude at scale while enabling research. Strong candidates have familiarity with performance optimization, distributed systems, large-scale service orchestration, and intelligent request routing. Familiarity with LLM inference optimization, batching strategies, and multi-accelerator deployments is highly encouraged but not strictly required.
Responsibilities
- Build and maintain large-scale, compute-agnostic inference deployments that serve models in production
- Design and implement intelligent routing algorithms to optimize request distribution across thousands of accelerators
- Implement autoscaling of compute fleets to match supply with demand across production, research, and experimental workloads
- Build production-grade deployment pipelines for releasing new models to millions of users
- Integrate new AI accelerator platforms and support multi-accelerator deployments to maintain hardware-agnostic capabilities
- Contribute to inference features such as structured sampling and prompt caching
- Analyze observability data to tune performance based on real-world production workloads
- Manage multi-region deployments and geographic routing for global customers
Requirements
- Significant software engineering experience, particularly with distributed systems
- Familiarity with performance optimization and large-scale service orchestration
- Experience or familiarity with load balancing, request routing, or traffic management systems
- Familiarity or experience with LLM inference optimization, batching, and caching strategies is highly encouraged
- Experience with Kubernetes and cloud infrastructure (AWS, GCP)
- Experience with Python or Rust
- Results-oriented, flexible, and able to work across role boundaries
- Interest in learning more about machine learning systems and infrastructure and care about societal impacts of AI
- Minimum of a Bachelor's degree in a related field or equivalent experience
Representative projects (examples)
- Designing intelligent routing algorithms for thousands of accelerators
- Autoscaling compute fleets for production and research workloads
- Building deployment pipelines for new model releases
- Integrating new AI accelerator platforms
- Contributing inference features (e.g., structured sampling, prompt caching)
- Tuning performance via observability data
- Managing multi-region deployments and geographic routing
Logistics
- Location: London, United Kingdom (location-based hybrid policy: staff expected in office at least 25% of the time)
- Education: At least a Bachelor's degree in a related field or equivalent experience
- Visa sponsorship: The company sponsors visas and retains an immigration lawyer to help with sponsorship where reasonable
Compensation
- Annual salary range: £325,000 - £390,000 GBP
How we work / Culture
- Collaborative, research-driven environment focused on large-scale, high-impact AI research
- Emphasis on communication and cross-functional collaboration
- Encouragement to apply even if you do not meet every listed qualification