Staff Software Engineer, Inference

GBP 325,000-390,000 per year
SENIOR
✅ Hybrid
✅ Visa Sponsorship

Used Tools & Technologies

Not specified

Required Skills & Competences

Kubernetes @ 4 Python @ 4 GCP @ 4 Algorithms @ 4 Distributed Systems @ 3 Machine Learning @ 4 AWS @ 4 Communication @ 4 Performance Optimization @ 3 Rust @ 4 LLM @ 3 Observability @ 4 AI @ 4

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The team is a quickly growing group of researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.

Role overview

The Inference team builds and maintains the critical systems that serve Claude to millions of users worldwide. The team operates the full stack from intelligent request routing to fleet-wide orchestration across diverse AI accelerators and multiple cloud platforms. The dual mandate is to maximize compute efficiency for customer growth and enable research by providing high-performance inference infrastructure for scientists.

As a Staff Software Engineer on the Inference team, you will work end-to-end to identify and address infrastructure blockers to serve Claude at scale while enabling research. Strong candidates have familiarity with performance optimization, distributed systems, large-scale service orchestration, and intelligent request routing. Familiarity with LLM inference optimization, batching strategies, and multi-accelerator deployments is highly encouraged but not strictly required.

Responsibilities

  • Build and maintain large-scale, compute-agnostic inference deployments that serve models in production
  • Design and implement intelligent routing algorithms to optimize request distribution across thousands of accelerators
  • Implement autoscaling of compute fleets to match supply with demand across production, research, and experimental workloads
  • Build production-grade deployment pipelines for releasing new models to millions of users
  • Integrate new AI accelerator platforms and support multi-accelerator deployments to maintain hardware-agnostic capabilities
  • Contribute to inference features such as structured sampling and prompt caching
  • Analyze observability data to tune performance based on real-world production workloads
  • Manage multi-region deployments and geographic routing for global customers

Requirements

  • Significant software engineering experience, particularly with distributed systems
  • Familiarity with performance optimization and large-scale service orchestration
  • Experience or familiarity with load balancing, request routing, or traffic management systems
  • Familiarity or experience with LLM inference optimization, batching, and caching strategies is highly encouraged
  • Experience with Kubernetes and cloud infrastructure (AWS, GCP)
  • Experience with Python or Rust
  • Results-oriented, flexible, and able to work across role boundaries
  • Interest in learning more about machine learning systems and infrastructure and care about societal impacts of AI
  • Minimum of a Bachelor's degree in a related field or equivalent experience

Representative projects (examples)

  • Designing intelligent routing algorithms for thousands of accelerators
  • Autoscaling compute fleets for production and research workloads
  • Building deployment pipelines for new model releases
  • Integrating new AI accelerator platforms
  • Contributing inference features (e.g., structured sampling, prompt caching)
  • Tuning performance via observability data
  • Managing multi-region deployments and geographic routing

Logistics

  • Location: London, United Kingdom (location-based hybrid policy: staff expected in office at least 25% of the time)
  • Education: At least a Bachelor's degree in a related field or equivalent experience
  • Visa sponsorship: The company sponsors visas and retains an immigration lawyer to help with sponsorship where reasonable

Compensation

  • Annual salary range: £325,000 - £390,000 GBP

How we work / Culture

  • Collaborative, research-driven environment focused on large-scale, high-impact AI research
  • Emphasis on communication and cross-functional collaboration
  • Encouragement to apply even if you do not meet every listed qualification