Staff / Senior Software Engineer, Inference

at Anthropic

📍 New York City, United States
📍 San Francisco, United States
📍 Seattle, United States

USD 300,000-485,000 per year

SENIOR

✅ Hybrid

✅ Visa Sponsorship

Used Tools & Technologies

Not specified

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Kubernetes @ 4 Python @ 6 GCP @ 4 Algorithms @ 4 Distributed Systems @ 4 Machine Learning @ 4 AWS @ 4 Azure @ 4 Communication @ 4 Rust @ 6 LLM @ 7 Observability @ 4 AI @ 4

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The Inference team builds and maintains the systems that serve Claude to millions of users worldwide, operating a compute-agnostic inference stack from intelligent request routing to fleet-wide orchestration across diverse AI accelerators and cloud platforms.

Responsibilities

Build and maintain high-performance, large-scale distributed systems for model serving and inference.
Maximize compute efficiency across heterogeneous accelerator fleets and cloud platforms.
Design and implement intelligent routing, load balancing, request routing, and traffic management to optimize request distribution.
Autoscale compute fleets and manage production-grade deployment pipelines for releasing models to millions of users.
Integrate new AI accelerator platforms and support inference for new model architectures.
Implement inference optimizations such as batching, caching, and structured sampling.
Analyze observability and production telemetry to tune performance and reliability.
Support multi-region deployments and geographic routing for global customers.

Requirements

Significant software engineering experience, particularly with distributed systems.
Experience implementing and deploying machine learning systems at scale.
Experience or strong interest in LLM inference optimization, batching, and caching strategies.
Familiarity with load balancing, request routing, traffic management, and autoscaling.
Experience with Kubernetes and cloud infrastructure (AWS, GCP, Azure).
Proficiency in Python or Rust.
Results-oriented, collaborative (pair programming), and interested in ML systems and infrastructure.
Education: at least a Bachelor's degree in a related field or equivalent experience.

Representative projects

Designing intelligent routing algorithms across thousands of accelerators.
Autoscaling compute fleets to match supply with demand across production and research workloads.
Building production deployment pipelines for large-scale model releases.
Integrating new AI accelerator platforms and contributing inference features (e.g., prompt caching, structured sampling).
Supporting inference for new model architectures and managing multi-region deployments.

Compensation

Annual salary range: $300,000 - $485,000 USD.

Logistics

Locations: San Francisco, CA; New York City, NY; Seattle, WA (United States).
Location-based hybrid policy: staff are expected to be in one of Anthropic's offices at least 25% of the time.
Visa sponsorship: Anthropic states they do sponsor visas and will make reasonable efforts and retain an immigration lawyer to assist, though not all roles/candidates can be successfully sponsored.
Applications are reviewed on a rolling basis.

How Anthropic describes culture and benefits

Collaborative research-driven environment; emphasis on communication and big-science research.
Competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and office space for collaboration.