Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Security @ 7
Go @ 7
Kubernetes @ 4
Python @ 7
GCP @ 4
Java @ 7
Hiring @ 4
AWS @ 4
Communication @ 4
Networking @ 4
Planning @ 4
Rust @ 7
GPU @ 4
AI @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
Anthropic is expanding beyond cloud infrastructure and is hiring a Senior Engineer on the Datacenter Machine Lifecycle team to own the end-to-end operational journey of every machine in their facility — from provisioning and deployment, through operation, maintenance, refresh, and decommissioning. This is greenfield work that requires defining processes, tooling, and operational standards for running and retiring hardware at scale, with a strong emphasis on trusted compute and hardware security.
Responsibilities
- Lead the build-out of automation to support datacenters containing tens of thousands of servers.
- Own and define the end-to-end machine lifecycle strategy (provisioning, deployment, operation, maintenance, refresh, decommissioning) and maintain automation and operational procedures for lifecycle events (e.g., hardware failures, firmware upgrades, fleet rotations).
- Partner closely with Infrastructure Security to design and enforce trusted compute standards across the machine lifecycle.
- Work closely with the Networking team to ensure end-to-end connectivity across all sites.
- Build and maintain tooling to track machine health, configuration, and operational status across the full datacenter fleet.
Requirements
- 5+ years of experience in datacenter operations, hardware infrastructure management, or a closely related discipline.
- Deep, hands-on experience with server hardware, including rack deployment, cabling, troubleshooting, and understanding failure modes at scale.
- Understanding of hardware lifecycle management end-to-end: asset tracking, provisioning workflows, maintenance scheduling, and decommissioning practices.
- Strong proficiency in at least one programming language (examples given: Python, Rust, Go, or Java).
- Comfortable navigating ambiguity and working independently to drive progress on complex, cross-functional problems.
- Clear communication skills and ability to build consensus with a wide range of stakeholders.
- Working knowledge of modern cloud infrastructure (Kubernetes, Infrastructure as Code, AWS, GCP).
- Willingness to occasionally travel to datacenter sites across North America.
Strong Candidates May Also Have
- Hands-on experience with GPU or AI accelerator hardware (e.g., NVIDIA A100/H100, AMD MI300, Google TPUs, AWS Trainium).
- Familiarity with modern provisioning tooling such as coreboot, LinuxBoot, or u-root.
- Experience building or contributing to datacenter automation or fleet management platforms.
- Experience building and deploying server operating system distributions across thousands of hosts.
- Background in large-scale capacity planning and hardware refresh strategy (hyperscaler or large cloud provider experience).
- Experience or strong interest in trusted compute and hardware security concepts (secure boot, TPM, hardware attestation, firmware verification).
Compensation
- Annual Salary: £255,000 - £325,000 GBP
Logistics
- Location: London, United Kingdom.
- Location-based hybrid policy: staff expected to be in one of Anthropic's offices at least 25% of the time (hybrid).
- Education: At least a Bachelor's degree in a related field or equivalent experience required.
- Visa sponsorship: Anthropic states they do sponsor visas and retain immigration legal support, though sponsorship is not guaranteed for every role/candidate.