Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Kubernetes @ 4
Python @ 6
GCP @ 4
Distributed Systems @ 4
Machine Learning @ 4
AWS @ 4
Communication @ 7
Rust @ 6
LLM @ 4
Observability @ 4
AI @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
Anthropic’s Inference team builds and maintains the systems that serve Claude to millions of users worldwide. The team owns the full stack from intelligent request routing to fleet-wide orchestration across diverse AI accelerators, operating large-scale, compute-agnostic inference deployments. The role focuses on maximizing compute efficiency for production workloads while enabling research by providing high-performance inference infrastructure.
Responsibilities
- Design and implement distributed systems for inference at large scale, including intelligent routing and traffic management across thousands of accelerators.
- Build autoscaling systems to match compute supply with demand across production and research workloads.
- Develop production-grade deployment pipelines and release processes for models.
- Integrate and support new AI accelerator platforms and maintain hardware-agnostic deployments.
- Implement inference features such as batching, structured sampling, prompt caching, and other LLM inference optimizations.
- Analyze observability data and tune performance for real-world production workloads.
- Manage multi-region deployments and geographic routing for global customers.
Requirements
- Significant software engineering experience, particularly with large-scale, high-performance distributed systems.
- Experience implementing and deploying machine learning systems at scale.
- Familiarity with load balancing, request routing, traffic management, autoscaling, batching, caching, and other inference optimization strategies.
- Experience with Kubernetes and cloud infrastructure (AWS, GCP).
- Proficiency in Python or Rust.
- Strong results orientation and ability to work across responsibilities; good communication skills.
- Education: at least a Bachelor's degree in a related field or equivalent experience.
Strong candidates may also have experience with
- LLM inference optimization and productionization
- Integrating new accelerator hardware and working across multiple cloud platforms
- Observability and performance tuning for large-scale systems
Benefits and Perks
- Competitive compensation and benefits
- Optional equity donation matching
- Generous vacation and parental leave
- Flexible working hours and hybrid work policy (expected in-office ~25% time for location-based roles)
- Lovely office space
Logistics
- Location: London, United Kingdom (hybrid; staff expected to be in office at least ~25% of the time)
- Annual salary range: £225,000 - £325,000 GBP
- Visa sponsorship: Anthropic states they do sponsor visas and retain immigration counsel to assist where possible.
- Deadline to apply: None (applications reviewed on a rolling basis).