Used Tools & Technologies
Machine LearningRequired Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Kubernetes @ 5
Python @ 3
Communication @ 6
Rust @ 3
GPU @ 3
Observability @ 3
AI @ 3
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The Launch Engineering team makes inference deployment continuous and unattended — moving inference code from merge to production across GPU, TPU, and Trainium fleets while minimizing disruption to serving capacity.
Responsibilities
- Own deployment orchestration that continuously moves validated inference builds into production across GPU, TPU, and Trainium fleets, unattended under normal conditions
- Improve capacity-aware deployment scheduling to maximize deployment throughput against constrained accelerator budgets and variable fleet sizes
- Extend deployment observability — dashboards and tooling that answer "what code is running in production," "where is my commit," and "what validation passed for this deploy"
- Drive down cycle time from code merge to production with pipeline architectures that minimize serial dependencies and maximize parallelism
- Optimize fleet rollout strategies for large-scale deployments across thousands of GPU, TPU, and Trainium chips, minimizing disruption to serving capacity
- Evolve self-service model onboarding so that new models can be added to the continuous deployment pipeline without Launch Engineering involvement
- Partner across the Inference organization with teams owning validation, autoscaling, and model routing to integrate deployment automation with their systems
Requirements
- 5+ years of experience building deployment, release, or delivery infrastructure at scale
- Strong software engineering skills with experience designing systems that manage complex state machines and multi-stage pipelines
- Experience with deployment systems where resource constraints shape the design — e.g., fleet capacity, network bandwidth, hardware availability, or coordinated rollout windows
- A track record of building automation that measurably improves deployment velocity and reliability
- Proficiency with Kubernetes-based deployments, rolling update mechanics, and container orchestration
- Comfort working across the stack — from backend services and databases to CLI tools and web UIs
- Strong communication skills and the ability to work closely with oncall engineers, model teams, and infrastructure partners
Strong candidates may also have
- Experience with ML inference or training infrastructure deployment, particularly across multiple accelerator types (GPU, TPU, Trainium)
- Background in capacity planning or resource-constrained scheduling (e.g., bin-packing, fleet management, job scheduling with hardware affinity)
- Experience with progressive delivery in systems with long validation cycles: canary/soak testing, blue-green deployments, traffic shifting, automated rollback
- Experience at companies with large-scale release engineering challenges (mobile release trains, monorepo deployments, multi-datacenter rollouts)
- Experience with Python and/or Rust in production systems
Salary
- Annual Salary: $320,000 - $485,000 USD
Logistics
- Education requirements: Bachelor's degree in a related field or equivalent experience
- Location-based hybrid policy: staff expected to be in one of Anthropic's offices at least 25% of the time
Visa sponsorship
- Anthropic states they sponsor visas and will make reasonable efforts to obtain a visa for successful candidates; an immigration lawyer is retained to assist
Benefits / Other
- Anthropic offers competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and office space for collaboration
How we're different
- Anthropic emphasizes large-scale, collaborative research with strong communication and cross-team work; candidates are encouraged to read Anthropic research and apply even if they don't meet every listed qualification.