Staff / Senior Software Engineer, Cloud Inference

at Anthropic

📍 San Francisco, United States
📍 Seattle, United States

USD 300,000-485,000 per year

SENIOR

✅ Hybrid

✅ Visa Sponsorship

Used Tools & Technologies

Machine Learning GPU

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Kubernetes @ 4 Python @ 6 GCP @ 4 CI/CD @ 4 Distributed Systems @ 7 AWS @ 4 Azure @ 4 Networking @ 4 Rust @ 6 API @ 4 LLM @ 4 Observability @ 4 AI @ 4

Details

Anthropic’s Cloud Inference team scales and optimizes Claude to serve developers and enterprise customers across AWS, GCP, Azure, and future cloud service providers. The team owns end-to-end serving for Claude on each cloud platform, including API integration, request routing, inference execution, capacity management, and operations. Engineers on this team make infrastructure decisions that improve scale, cost-effectiveness, and reliability for large-scale LLM inference.

Responsibilities

Design and build infrastructure to serve Claude across multiple CSPs, accounting for differences in compute hardware, networking, APIs, and operational models
Collaborate with CSP partner engineering teams to resolve operational issues, influence provider roadmaps, and stand up end-to-end serving on new cloud platforms
Design and evolve CI/CD automation systems, including validation and deployment pipelines, to reliably ship new model versions across cloud platforms
Design interfaces and tooling abstractions across CSPs to enable cost-effective inference management and reduce per-platform complexity
Contribute to capacity planning and autoscaling strategies that dynamically match supply with demand across validation and production workloads
Optimize inference cost and performance across providers by designing workload placement and routing systems that choose cost-effective accelerators and regions
Contribute to inference features that must work consistently across platforms
Analyze observability data across providers to identify performance bottlenecks, cost anomalies, and regressions, and drive remediation based on production workloads

Requirements

Significant software engineering experience with a strong background in high-performance, large-scale distributed systems serving millions of users
Experience building or operating services on at least one major cloud platform (AWS, GCP, or Azure); exposure to Kubernetes, Infrastructure as Code, or other container orchestration systems
Strong interest in inference and familiarity with LLM inference optimization, batching, caching, and serving strategies
Experience or interest in capacity management, cost optimization, and resource planning at scale across heterogeneous environments
Ability to collaborate cross-functionally with internal teams and external CSP partners
Highly autonomous, fast learner, and able to take ownership of problems end-to-end
Education: at least a Bachelor's degree in a related field or equivalent experience

Strongly Preferred / Nice-to-Have

Direct experience working with CSP partner teams to scale infrastructure across multiple platforms
Background building platform-agnostic tooling or abstraction layers across cloud providers
Hands-on experience with ML infrastructure (GPUs, TPUs, Trainium, or other AI accelerators)
Experience designing and building CI/CD systems that automate deployment and validation across cloud environments
Solid understanding of multi-region deployments, geographic routing, and global traffic management
Proficiency in Python or Rust

Logistics

Location: San Francisco, CA and Seattle, WA
Location-based hybrid policy: we expect staff to be in an office at least 25% of the time
Visa sponsorship: Anthropic states they do sponsor visas and retain an immigration lawyer to assist when possible
Education requirement: Bachelor's degree or equivalent experience

Compensation & Benefits

Annual salary range: $300,000 - $485,000 USD
Anthropic offers competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and office space for collaboration.