Used Tools & Technologies
Machine LearningRequired Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Security @ 3
Kubernetes @ 3
GCP @ 3
CI/CD @ 3
Distributed Systems @ 5
AWS @ 3
Azure @ 3
GPU @ 3
Observability @ 2
AI @ 3
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.
Responsibilities
Developer Productivity & Tooling
- Drive cross-functional programs to improve developer environments, CI/CD infrastructure, and release processes that enable rapid innovation while maintaining high security standards
- Coordinate large-scale migrations and platform modernization efforts across engineering teams
- Partner with teams to measure and improve developer productivity metrics, identifying bottlenecks and driving systematic improvements
- Lead initiatives to integrate AI tools into development workflows, helping Anthropic be at the forefront of AI-assisted research and engineering
Infrastructure Reliability & Operations
- Drive programs to establish and achieve reliability targets across training infrastructure and production services
- Coordinate incident response improvements, post-mortem processes, and on-call rotations that help teams operate effectively
- Establish metrics and dashboards to track infrastructure health, capacity utilization, and operational excellence
Cross-functional Coordination
- Serve as the critical bridge between infrastructure teams, research, and product, translating technical complexities into clear updates for a variety of audiences
- Consult with stakeholders to deeply understand infrastructure, data, and compute needs, identifying solutions to support frontier research and product development
- Drive alignment on priorities and timelines across teams with competing constraints
Requirements
- 5+ years of technical program management experience, with a track record of successfully delivering complex infrastructure programs in ML/AI systems or large-scale distributed systems
- Deep technical understanding of infrastructure systems—enough to engage substantively with engineers, identify technical risks, and add value beyond project tracking
- Ability to create structure and processes in ambiguous environments, bringing clarity to complex cross-team initiatives
- Strong stakeholder management skills and ability to build trust with both technical and non-technical partners
- Comfortable navigating competing priorities and using data to drive technical decisions
- Experience with developer productivity initiatives, CI/CD systems, or infrastructure scaling
- Passion for reliability, scalability, security, and continuous improvement
- Passion for supporting internal partners like research to understand their unique needs
- Passionate about AI infrastructure and understand the unique challenges of building and operating systems at frontier scale
- Experience with Kubernetes, cloud platforms (AWS, GCP, Azure), and ML infrastructure (GPU/TPU/Trainium clusters)
- Background working with research teams and translating their needs into concrete technical requirements
- Experience driving adoption of AI tools to improve engineering productivity
- Familiarity with observability tooling and practices
Logistics
- Education requirements: at least a Bachelor's degree in a related field or equivalent experience
- Location-based hybrid policy: currently, staff are expected to be in one of Anthropic's offices at least 25% of the time (some roles may require more time in-office)
- Visa sponsorship: Anthropic states they do sponsor visas and retain an immigration lawyer to help, though sponsorship is not guaranteed for every role/candidate
- Deadline to apply: None (applications accepted on a rolling basis)
Compensation
- Annual Salary: $290,000 - $365,000 USD
About Anthropic / How we're different
- We believe the highest-impact AI research will be big science and work as a single cohesive team on a few large-scale research efforts
- We value impact over smaller, specific puzzles, and host frequent research discussions to pursue high-impact work
- Research directions include topics related to GPT-3, Circuit-Based Interpretability, Multimodal Neurons, Scaling Laws, AI & Compute, Concrete Problems in AI Safety, and Learning from Human Preferences
Benefits
- Competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and an office space for collaboration
Other notes
- Applicants are encouraged to apply even if they do not meet every qualification listed
- Guidance on candidate AI usage is provided (link in original posting)
- Anthropic recruiters only contact from @anthropic.com addresses and will not ask for money or banking information before the first day