Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Machine Learning @ 3
Hiring @ 3
Communication @ 6
Networking @ 3
Debugging @ 3
API @ 3
Engineering Management @ 5
LLM @ 3
GPU @ 2
AI @ 3
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The Inference Routing team builds the cluster-level routing and coordination plane for Anthropic's inference fleet — the system between the API surface and the inference engines that makes fleet-wide efficiency decisions in real time. The team focuses on routing decisions that account for caching, accelerator suitability, and in-flight work to maximize throughput and meet latency SLOs.
Representative work
- Decide whether a proposed routing algorithm change is worth deploy risk, given modeled throughput gains and blast radius
- Sequence competing priorities (e.g., KV-cache offload, new coordination protocol, model launches)
- Debug persistent tail-latency regressions from fleet-level metrics down to kernel/network/framework issues
- Build quantitative cases to peer teams for cross-team protocol changes
- Run post-incident reviews and turn them into lasting process changes
- Interview and evaluate candidates with deep systems and scheduler experience
Responsibilities
Drive system-level performance
- Own the technical roadmap for cluster-level inference efficiency: routing decisions, cache placement and eviction, cross-replica coordination, and synchronization protocols
- Partner with inference engine, kernels, and performance teams to identify fleet-level throughput and latency wins and turn them into measurable shipped improvements
- Build and enforce quantitative performance modeling practices: claim wins only when measurable and know expected effects before shipping
Deliver reliably and operate cleanly
- Set technical strategy for routing across heterogeneous hardware (GPUs, TPUs, Trainium) and serving surfaces
- Run the team's operational backbone: on-call rotation, incident response, postmortems, and deploy safety
- Clarify dependencies and commitments between API surface, inference engines, and cloud deployment teams
Build and grow the team
- Develop, retain, and hire a strong team that can operate at OS and framework levels when required
- Coach engineers through shifting priorities driven by model launches, hardware changes, and scaling demands
- Step in to unblock critical deploys or synthesize design debates when necessary
Requirements
- 5+ years of engineering management experience, ideally with part of that leading critical-path production infrastructure at scale
- Deep systems background (examples: load balancing, scheduling, cache-coherent distributed state, high-performance networking) sufficient to make architectural calls and evaluate kernel/framework-level work
- Experience shipping performance improvements in large-scale systems with measurable impact
- Experience running production infrastructure with operational stakes: on-call, incident response, capacity events, and deploy discipline
- Results-oriented with a bias toward impact; able to balance throughput, latency, stability, and feature velocity
- Strong cross-team communication and collaboration skills
- Curious about machine learning systems; willing to learn transformer inference and its systems implications
Strong candidates may also have
- Experience with LLM inference serving: KV caching, continuous batching, request scheduling, prefill/decode disaggregation
- Background in cluster schedulers, load balancers, service meshes, or coordination planes at scale
- Familiarity with heterogeneous accelerator fleets (GPU/TPU/Trainium) and workload placement trade-offs
- Experience with GPU/accelerator programming, ML framework internals, or OS-level performance debugging
- Experience leading teams at supercomputing or hyperscaler infrastructure scale or through rapid growth
Compensation
- Annual Salary: $405,000 - $485,000 USD
Logistics
- Education: At least a Bachelor's degree in a related field or equivalent experience
- Location: San Francisco, CA or New York City, NY (location-based hybrid policy: staff expected to be in an office at least 25% of the time)
- Visa sponsorship: Anthropic states they sponsor visas and retain an immigration lawyer to assist where possible
Benefits
- Competitive compensation and benefits
- Optional equity donation matching
- Generous vacation and parental leave
- Flexible working hours and an office space for collaboration
How we're different
Anthropic emphasizes large-scale collaborative AI research, communication skills, and impact-driven work. They encourage applicants who may not meet every listed qualification to apply and highlight diversity and inclusion in hiring.