Technical Program Manager, Compute

at Anthropic

📍 New York City, United States
📍 San Francisco, United States
📍 Seattle, United States

USD 290,000-365,000 per year

MIDDLE

✅ Hybrid

✅ Visa Sponsorship

Used Tools & Technologies

Machine Learning

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Kubernetes @ 2 GCP @ 3 Leadership @ 3 AWS @ 3 Azure @ 3 Communication @ 3 GPU @ 3 Observability @ 3 AI @ 3 Slurm @ 2 HPC @ 3

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The Compute team runs the compute infrastructure that supports model training, evaluation, and inference workloads.

About the role

As a Technical Program Manager on the Compute team, you will drive planning, coordination, and execution of programs that keep Anthropic's compute infrastructure running efficiently at scale. You will take ownership of critical workstreams across the compute lifecycle, from procurement and bringing capacity online to allocation and utilization across teams. The exact focus will depend on your strengths and the team's needs. You will partner with Infrastructure, Systems, Research, Finance, and Capacity Engineering to shape processes, tooling, and coordination mechanisms.

Responsibilities

Own and drive critical programs across the compute lifecycle, coordinating execution across multiple engineering, research, and operations teams
Build and maintain operational visibility into the compute fleet: supply, demand, utilization, and health
Lead cross-functional coordination for compute transitions: bringing new capacity online, migrating workloads, and managing decommissions across cloud providers and hardware platforms
Partner with engineering and research leadership to prioritize and align on compute planning, allocation, and usage
Identify and close operational gaps via tooling, improved processes, or better cross-team communication
Own trade-off discussions between utilization, cost, latency, and reliability; synthesize inputs from technical and business stakeholders and communicate decisions to leadership
Develop and improve processes and frameworks for planning, tracking, and executing compute programs at scale

Requirements / Qualifications

7+ years of technical program management experience in infrastructure, platform engineering, or compute-intensive environments
Experience leading complex, cross-functional programs involving multiple engineering teams with competing priorities and ambiguous requirements
Experience working with research or ML teams and translating their needs into operational plans and technical requirements
Comfort diving into technical details (cloud infrastructure, cluster management, job scheduling, resource orchestration) while maintaining program-level visibility
Ability to define scope and build processes in ambiguous, fast-moving environments
Strong communication skills and credibility with engineers, researchers, finance, and executive leadership
Track record of building trust with engineering teams and driving changes through influence rather than authority

Strong candidates may also have

Experience managing compute capacity across multiple cloud providers (AWS, GCP, Azure) or hybrid cloud/on-premises environments
Familiarity with job scheduling, resource orchestration, or workload management systems (Kubernetes, Slurm, Borg, YARN, or custom schedulers)
Experience with GPU or accelerator infrastructure and large-scale ML training/inference workloads
Experience building or improving observability for infrastructure systems: dashboards, alerting, efficiency metrics, or cost attribution
Capacity planning experience including demand forecasting, cost modeling, or hardware lifecycle management
Experience scaling through hypergrowth in AI/ML, HPC, or large-scale cloud environments

Logistics

Locations: San Francisco, CA; New York City, NY; Seattle, WA
Education: Bachelor’s degree in a related field or equivalent experience required
Location-based hybrid policy: staff expected to be in one of Anthropic's offices at least 25% of the time (some roles may require more time in office)
Visa sponsorship: Anthropic states they do sponsor visas and retain an immigration lawyer to assist where possible
Deadline to apply: applications are received on a rolling basis

Compensation & Benefits

Annual salary range: $290,000 - $365,000 USD
Anthropic offers competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and office workspace

How we're different

Anthropic emphasizes large-scale, collaborative AI research, valuing impact and communication. The organization hosts frequent research discussions and focuses on high-impact research directions (examples listed in the posting include work related to GPT-3, interpretability, scaling laws, and learning from human preferences).