Staff + Senior Software Engineer, Inference

at Anthropic

📍 New York City, United States
📍 San Francisco, United States
📍 Seattle, United States

USD 320,000-485,000 per year

MIDDLE SENIOR

✅ Hybrid

✅ Visa Sponsorship

Tech Stack
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

AI AWS @ 3 Algorithms Azure @ 3 Distributed Systems @ 3 GCP @ 3 Kubernetes @ 3 LLM @ 2 Machine Learning @ 3 Networking Observability Python @ 5 Rust @ 5 Slack @ 3

Details

About Anthropic

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.

About the role

Our Inference team is responsible for building and maintaining the critical systems that serve Claude to millions of users worldwide. We bring Claude to life by serving our models via the industry’s largest compute-agnostic inference deployments. We are responsible for the entire stack from intelligent request routing to fleet-wide orchestration across diverse AI accelerators.

The team has a dual mandate: maximizing compute efficiency to reliably serve our explosive customer growth, while enabling breakthrough research by giving our scientists the high-performance inference infrastructure they need to develop next-generation models. We tackle complex, distributed systems challenges across multiple accelerator families and emerging AI hardware running in multiple cloud platforms.

Inference systems are highly performance sensitive distributed systems. Inference serves hundreds of thousands of customers every day, and the size & span of the inference fleet requires sophisticated routing, scaling, and networking systems.

Responsibilities

Design, build, and maintain the distributed systems that serve Claude to millions of users worldwide
Develop resilient, flexible systems that adapt in real time to real world events
Develop intelligent request routing, load balancing, and traffic management systems across thousands of accelerators
Maximize compute efficiency across the fleet by autoscaling and orchestrating production, research, and experimental workloads
Build and operate production-grade deployment pipelines for releasing new models to users
Provide high-performance inference infrastructure that enables researchers to develop next-generation models
Integrate new AI accelerator platforms and support inference for new model architectures

Minimum qualifications

Significant software engineering experience, particularly with distributed systems
Results-oriented, with a bias towards flexibility and impact
Willingness to pick up slack, even if it goes outside your job description
Desire to learn more about machine learning systems and infrastructure
Thrive in environments where technical excellence directly drives both business results and research breakthroughs
Care about the societal impacts of your work

Preferred qualifications

Experience with high-performance, large-scale distributed systems
Experience implementing and deploying machine learning systems at scale
Experience with load balancing, request routing, or traffic management systems
Familiarity with LLM inference optimization, batching, and caching strategies
Experience with Kubernetes and cloud infrastructure (AWS, GCP, Azure)
Proficiency in Python or Rust

Representative projects

Designing intelligent routing algorithms that optimize request distribution across many accelerators in different environments
Autoscaling our compute fleet to dynamically match supply with demand across production, research, and experimental workloads
Building production-grade deployment pipelines for releasing new models to millions of users reliably
Contributing to new inference features
Supporting inference for new model architectures
Analyzing observability data to tune performance based on real-world production workloads
Managing multi-region deployments and geographic routing for global customers

Logistics

Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.
Visa sponsorship: We do sponsor visas, but not every role/candidate can be sponsored. If Anthropic makes you an offer, they will make every reasonable effort to get you a visa and retain an immigration lawyer to help with this.
Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience
Required field of study: A field relevant to the role as demonstrated through coursework, training, or professional experience
Minimum years of experience: Years of experience required will correlate with the internal job level requirements for the position