Staff Engineer, Datacenter Server Lifecycle

USD 320,000-405,000 per year
MIDDLE
✅ Hybrid
✅ Visa Sponsorship

Used Tools & Technologies

IaC

Required Skills & Competences

Security @ 3 Go @ 5 Kubernetes @ 3 Python @ 5 GCP @ 3 Java @ 5 AWS @ 3 Communication @ 3 Networking @ 3 Planning @ 3 Rust @ 5 GPU @ 3 AI @ 3

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.

Role overview

As a Staff Engineer on the Datacenter Server Lifecycle team, you will own the end-to-end operational journey of every machine in our facilities — from initial provisioning and deployment, across its working life, through maintenance and refresh, and all the way to decommissioning. This is greenfield work: you will help define processes, tooling, and operational standards that govern how we run and retire hardware at scale. A distinguishing aspect of this role is its deep intersection with security: machines handle sensitive workloads, and ensuring each machine is trusted, attested, and operating with a verified chain of integrity from hardware up is a core part of the job. You will partner closely with Infrastructure Security and Networking teams.

Responsibilities

  • Lead the build-out of automation to support datacenters containing tens of thousands of servers.
  • Define and own the end-to-end server lifecycle strategy — provisioning, deployment, operation, maintenance, refresh, and decommissioning — and maintain automation and operational procedures for common lifecycle events (hardware failures, firmware upgrades, fleet rotations).
  • Partner closely with Infrastructure Security to design and enforce trusted compute standards across the server lifecycle (secure provisioning through end-of-life handling).
  • Work with the Networking team to ensure end-to-end connectivity across all sites.
  • Build and maintain tooling to track machine health, configuration, and operational status across the full datacenter fleet.

Minimum qualifications

  • Hands-on experience with server hardware, including rack deployment, cabling, troubleshooting, and understanding failure modes at scale.
  • End-to-end understanding of hardware lifecycle management: asset tracking, provisioning workflows, maintenance scheduling, and decommissioning practices.
  • Proficiency in at least one programming language (examples given: Python, Rust, Go, or Java).
  • Working knowledge of modern cloud infrastructure, including Kubernetes, Infrastructure as Code, AWS, and GCP.
  • Ability to communicate clearly and build consensus with a wide range of stakeholders.
  • Comfort navigating ambiguity and making progress on complex, cross-functional problems.
  • Willingness to travel occasionally to datacenter sites across North America.

Preferred qualifications

  • 8+ years of experience in datacenter operations, hardware infrastructure management, or a closely related discipline.
  • Hands-on experience with GPU or AI accelerator hardware (e.g., NVIDIA A100/H100, AMD MI300, Google TPUs, or AWS Trainium) and an understanding of their operational demands.
  • Familiarity with modern provisioning tooling such as coreboot, LinuxBoot, or u-root.
  • Experience building or contributing to datacenter automation or fleet management platforms.
  • Experience building and deploying server operating system distributions across thousands of hosts.
  • Background in large-scale capacity planning and hardware refresh strategy, ideally at a hyperscaler or large cloud provider.
  • Experience with trusted compute and hardware security concepts such as secure boot, TPM, hardware attestation, and firmware verification — or a strong desire to develop deep expertise in this area.

Compensation

Annual Salary: $320,000 - $405,000 USD

Logistics

  • Minimum education: Bachelor’s degree or equivalent combination of education, training, and/or experience.
  • Location-based hybrid policy: currently expect all staff to be in one of our offices at least 25% of the time (some roles may require more office time).
  • Visa sponsorship: Anthropic states they do sponsor visas and will make reasonable efforts to obtain a visa for an offer recipient; they retain an immigration lawyer to help.

How we're different

Anthropic works as a cohesive team on a few large-scale research efforts and values communication and collaboration. The role is embedded in large-scale AI infrastructure work and intersects with research and security efforts.

How to apply

Application is handled via Anthropic's careers/job portal. The posting requests a Resume or LinkedIn profile and includes standard application questions (location, visa needs, availability, etc.).