Staff + Senior Software Engineer, Inference

USD 320,000-485,000 per year
SENIOR
✅ Hybrid
✅ Visa Sponsorship

Used Tools & Technologies

Not specified

Required Skills & Competences

Kubernetes @ 4 Python @ 6 GCP @ 4 Algorithms @ 4 Distributed Systems @ 4 Machine Learning @ 4 AWS @ 4 Azure @ 4 Rust @ 6 Slack @ 4 LLM @ 3 Observability @ 4 AI @ 4

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The Inference team builds and maintains the systems that serve Claude to millions of users worldwide, delivering compute-agnostic inference deployments across diverse AI accelerators and cloud platforms. The team focuses on maximizing compute efficiency and enabling research by providing high-performance inference infrastructure.

Responsibilities

  • Design, build, and maintain distributed systems that serve Claude to millions of users worldwide
  • Develop intelligent request routing, load balancing, and traffic management systems across thousands of accelerators
  • Maximize compute efficiency across the fleet by autoscaling and orchestrating production, research, and experimental workloads
  • Build and operate production-grade deployment pipelines for releasing new models to users
  • Provide high-performance inference infrastructure that enables researchers to develop next-generation models
  • Integrate new AI accelerator platforms and support inference for new model architectures
  • Use observability data to tune and improve performance based on real-world production workloads
  • Manage multi-region deployments and geographic routing for global customers

Requirements

Minimum qualifications:

  • Significant software engineering experience, particularly with distributed systems
  • Results-oriented, with a bias towards flexibility and impact
  • Willingness to pick up slack, even if it goes outside your job description
  • Enjoy pair programming
  • Desire to learn more about machine learning systems and infrastructure
  • Thrive in environments where technical excellence directly drives both business results and research breakthroughs
  • Care about the societal impacts of your work

Preferred qualifications:

  • Experience with high-performance, large-scale distributed systems
  • Experience implementing and deploying machine learning systems at scale
  • Experience with load balancing, request routing, or traffic management systems
  • Familiarity with LLM inference optimization, batching, and caching strategies
  • Experience with Kubernetes and cloud infrastructure (AWS, GCP, Azure)
  • Proficiency in Python or Rust

Representative projects (examples of work you might do):

  • Designing intelligent routing algorithms that optimize request distribution across thousands of accelerators
  • Autoscaling the compute fleet to match supply with demand across production, research, and experimental workloads
  • Building production-grade deployment pipelines for releasing new models to millions of users
  • Integrating new AI accelerator platforms to maintain hardware-agnostic advantage
  • Contributing to inference features (e.g., structured sampling, prompt caching)
  • Supporting inference for new model architectures
  • Analyzing observability data to tune performance based on real-world production workloads

Logistics & Additional Information

  • Minimum education: Bachelor’s degree or equivalent combination of education, training, and/or experience
  • Location-based hybrid policy: staff expected in one of Anthropic’s offices at least 25% of the time
  • Visa sponsorship: Anthropic states they do sponsor visas and retain an immigration lawyer to assist, though not every role/candidate can be successfully sponsored
  • Applications reviewed on a rolling basis

Benefits

  • Competitive compensation and benefits
  • Optional equity donation matching
  • Generous vacation and parental leave
  • Flexible working hours
  • Office space for collaboration