Used Tools & Technologies
Machine LearningRequired Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Kubernetes @ 6
Python @ 4
Communication @ 7
Rust @ 4
GPU @ 4
Observability @ 4
AI @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The Launch Engineering team makes inference deployment boring and unattended by designing and building deployment infrastructure that moves inference code from merge to production across resource-constrained accelerator fleets (GPU, TPU, Trainium). This role focuses on orchestration, capacity-aware scheduling, observability, and pipeline architectures that reduce cycle time and minimize disruption to serving capacity.
Responsibilities
- Own deployment orchestration that continuously moves validated inference builds into production across GPU, TPU, and Trainium fleets, unattended under normal conditions
- Improve capacity-aware deployment scheduling to maximize deployment throughput against constrained accelerator budgets and variable fleet sizes
- Extend deployment observability — dashboards and tooling that answer "what code is running in production," "where is my commit," and "what validation passed for this deploy"
- Drive down cycle time from code merge to production with pipeline architectures that minimize serial dependencies and maximize parallelism
- Optimize fleet rollout strategies for large-scale deployments across thousands of accelerator chips, minimizing disruption to serving capacity
- Evolve self-service model onboarding so new models can be added to the continuous deployment pipeline without Launch Engineering involvement
- Partner across the Inference organization with teams owning validation, autoscaling, and model routing to integrate deployment automation with their systems
Minimum qualifications
- Strong software engineering skills, including experience designing systems that manage complex state machines and multi-stage pipelines
- Proficiency with Kubernetes-based deployments, rolling update mechanics, and container orchestration
- Experience building deployment, release, or delivery infrastructure where resource constraints (fleet capacity, network bandwidth, hardware availability, coordinated rollout windows) shape the design
- A track record of building automation that measurably improves deployment velocity and reliability
- Comfort working across the stack — from backend services and databases to CLI tools and web UIs
- Strong communication skills and the ability to work closely with oncall engineers, model teams, and infrastructure partners
Preferred qualifications
- 5+ years of experience building deployment, release, or delivery infrastructure at scale
- Experience with Python and/or Rust in production systems
- Experience with ML inference or training infrastructure deployment, particularly across multiple accelerator types (GPU, TPU, Trainium)
- Background in capacity planning or resource-constrained scheduling (e.g., bin-packing, fleet management, job scheduling with hardware affinity)
- Experience with progressive delivery in systems with long validation cycles: canary/soak testing, blue-green deployments, traffic shifting, automated rollback
- Experience at companies with large-scale release engineering challenges (mobile release trains, monorepo deployments, multi-datacenter rollouts)
Compensation
Annual Salary: $320,000 - $485,000 USD
Logistics
- Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience
- Location-based hybrid policy: currently, staff are expected to be in one of the offices at least 25% of the time (some roles may require more time in office)
- Visa sponsorship: Anthropic states they sponsor visas and retain an immigration lawyer to assist when they make an offer
How we're different
Anthropic works as a cohesive team on a few large-scale research efforts, values communication, and emphasizes impact on steerable, trustworthy AI. The team is collaborative and frequently hosts research discussions.
Application notes
The posting encourages applicants who may not meet every qualification to apply and includes candidate guidance about AI usage in the application process.