Used Tools & Technologies
IaC Machine LearningRequired Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Security @ 4
Kubernetes @ 4
Python @ 6
GCP @ 4
CI/CD @ 4
Distributed Systems @ 7
AWS @ 4
Azure @ 4
Communication @ 4
Networking @ 4
Rust @ 6
API @ 4
LLM @ 4
Observability @ 4
AI @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The Cloud Inference team scales and optimizes Claude to serve developers and enterprise customers across AWS, GCP, Azure, and future cloud service providers. The team owns the end-to-end product of Claude on each cloud platform, including API integration, request routing, inference execution, capacity management, and operations.
This role focuses on building high-performance, large-scale backend services and infrastructure to serve LLMs across heterogeneous cloud providers while optimizing for reliability, cost, and performance.
Responsibilities
- Design, build, and own backend services and infrastructure that serve Claude across multiple CSPs, accounting for differences in compute hardware, networking, APIs, and operational models
- Work cross-functionally with internal inference, product API, systems, and security teams and with CSP partners to stand up the full serving stack on new cloud platforms and resolve operational issues
- Build and evolve CI/CD automation systems, including validation and deployment pipelines that reliably ship new model versions at scale
- Design interfaces and tooling abstractions across CSPs to enable cost-effective inference management and reduce per-platform complexity
- Contribute to capacity planning, autoscaling, and workload routing strategies to match supply with demand and route requests to cost-effective accelerators and regions
- Analyze observability data across providers to identify performance bottlenecks, cost anomalies, and regressions, and drive remediation based on production workloads
Minimum qualifications
- Significant software engineering experience with a strong background in high-performance, large-scale distributed systems serving millions of users
- Experience building or operating services on at least one major cloud platform (AWS, GCP, or Azure) with exposure to Kubernetes, Infrastructure as Code, or container orchestration
- Curious about LLM serving (prior inference/ML experience not required)
- Comfortable working cross-functionally with internal teams and external partners
- Experience aligning goals and delivering impact with external partners
- Fast learner who can quickly ramp on new technologies, hardware platforms, and provider ecosystems
- Highly autonomous and able to take end-to-end ownership, including work outside a strict job description
Preferred qualifications
- Direct experience working with CSPs to scale infrastructure or products across multiple platforms and navigating differences in networking, security, privacy, billing, and managed services
- Hands-on experience with capacity management, cost optimization, or resource planning at scale across heterogeneous environments
- Solid understanding of multi-region deployments, geographic routing, and global traffic management
- Proficiency in Python or Rust
Compensation
- Annual salary range: $320,000 - $485,000 USD
Logistics
- Minimum education: Bachelor’s degree or equivalent combination of education, training, and/or experience
- Location-based hybrid policy: staff are expected to be in one of Anthropic's offices at least 25% of the time; some roles may require more office presence
Visa sponsorship
- Anthropic states that they do sponsor visas and retain an immigration lawyer to assist, though sponsorship is not guaranteed for every role/candidate
How we're different
Anthropic emphasizes large-scale, collaborative AI research with a focus on steerable and trustworthy systems. The team values communication and works on a few large-scale research efforts with high impact.