Technical Program Manager, Infrastructure

at Anthropic

📍 New York City, United States
📍 San Francisco, United States
📍 Seattle, United States

USD 290,000-365,000 per year

MIDDLE

✅ Hybrid

✅ Visa Sponsorship

Used Tools & Technologies

Machine Learning

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Security @ 3 Kubernetes @ 3 GCP @ 3 CI/CD @ 3 Distributed Systems @ 5 AWS @ 3 Azure @ 3 GPU @ 3 Observability @ 2 AI @ 3

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.

Responsibilities

Developer Productivity & Tooling

Drive cross-functional programs to improve developer environments, CI/CD infrastructure, and release processes that enable rapid innovation while maintaining high security standards
Coordinate large-scale migrations and platform modernization efforts across engineering teams
Partner with teams to measure and improve developer productivity metrics, identifying bottlenecks and driving systematic improvements
Lead initiatives to integrate AI tools into development workflows, helping Anthropic be at the forefront of AI-assisted research and engineering

Infrastructure Reliability & Operations

Drive programs to establish and achieve reliability targets across training infrastructure and production services
Coordinate incident response improvements, post-mortem processes, and on-call rotations that help teams operate effectively
Establish metrics and dashboards to track infrastructure health, capacity utilization, and operational excellence

Cross-functional Coordination

Serve as the critical bridge between infrastructure teams, research, and product, translating technical complexities into clear updates for a variety of audiences
Consult with stakeholders to deeply understand infrastructure, data, and compute needs, identifying solutions to support frontier research and product development
Drive alignment on priorities and timelines across teams with competing constraints

Requirements

5+ years of technical program management experience, with a track record of successfully delivering complex infrastructure programs in ML/AI systems or large-scale distributed systems
Deep technical understanding of infrastructure systems—enough to engage substantively with engineers, identify technical risks, and add value beyond project tracking
Ability to create structure and processes in ambiguous environments, bringing clarity to complex cross-team initiatives
Strong stakeholder management skills and ability to build trust with both technical and non-technical partners
Comfortable navigating competing priorities and using data to drive technical decisions
Experience with developer productivity initiatives, CI/CD systems, or infrastructure scaling
Passion for reliability, scalability, security, and continuous improvement
Passion for supporting internal partners like research to understand their unique needs
Passionate about AI infrastructure and understand the unique challenges of building and operating systems at frontier scale
Experience with Kubernetes, cloud platforms (AWS, GCP, Azure), and ML infrastructure (GPU/TPU/Trainium clusters)
Background working with research teams and translating their needs into concrete technical requirements
Experience driving adoption of AI tools to improve engineering productivity
Familiarity with observability tooling and practices

Logistics

Education requirements: at least a Bachelor's degree in a related field or equivalent experience
Location-based hybrid policy: currently, staff are expected to be in one of Anthropic's offices at least 25% of the time (some roles may require more time in-office)
Visa sponsorship: Anthropic states they do sponsor visas and retain an immigration lawyer to help, though sponsorship is not guaranteed for every role/candidate
Deadline to apply: None (applications accepted on a rolling basis)

Compensation

Annual Salary: $290,000 - $365,000 USD

About Anthropic / How we're different

We believe the highest-impact AI research will be big science and work as a single cohesive team on a few large-scale research efforts
We value impact over smaller, specific puzzles, and host frequent research discussions to pursue high-impact work
Research directions include topics related to GPT-3, Circuit-Based Interpretability, Multimodal Neurons, Scaling Laws, AI & Compute, Concrete Problems in AI Safety, and Learning from Human Preferences

Benefits

Competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and an office space for collaboration

Other notes

Applicants are encouraged to apply even if they do not meet every qualification listed
Guidance on candidate AI usage is provided (link in original posting)
Anthropic recruiters only contact from @anthropic.com addresses and will not ask for money or banking information before the first day