Engineering Manager, Inference Routing and Performance

USD 405,000-485,000 per year
MIDDLE
✅ Hybrid
✅ Visa Sponsorship

Used Tools & Technologies

Not specified

Required Skills & Competences

Machine Learning @ 3 Hiring @ 3 Communication @ 6 Networking @ 3 Debugging @ 3 API @ 3 Engineering Management @ 5 LLM @ 3 GPU @ 2 AI @ 3

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The Inference Routing team builds the cluster-level routing and coordination plane for Anthropic's inference fleet — the system between the API surface and the inference engines that makes fleet-wide efficiency decisions in real time. The team focuses on routing decisions that account for caching, accelerator suitability, and in-flight work to maximize throughput and meet latency SLOs.

Representative work

  • Decide whether a proposed routing algorithm change is worth deploy risk, given modeled throughput gains and blast radius
  • Sequence competing priorities (e.g., KV-cache offload, new coordination protocol, model launches)
  • Debug persistent tail-latency regressions from fleet-level metrics down to kernel/network/framework issues
  • Build quantitative cases to peer teams for cross-team protocol changes
  • Run post-incident reviews and turn them into lasting process changes
  • Interview and evaluate candidates with deep systems and scheduler experience

Responsibilities

Drive system-level performance

  • Own the technical roadmap for cluster-level inference efficiency: routing decisions, cache placement and eviction, cross-replica coordination, and synchronization protocols
  • Partner with inference engine, kernels, and performance teams to identify fleet-level throughput and latency wins and turn them into measurable shipped improvements
  • Build and enforce quantitative performance modeling practices: claim wins only when measurable and know expected effects before shipping

Deliver reliably and operate cleanly

  • Set technical strategy for routing across heterogeneous hardware (GPUs, TPUs, Trainium) and serving surfaces
  • Run the team's operational backbone: on-call rotation, incident response, postmortems, and deploy safety
  • Clarify dependencies and commitments between API surface, inference engines, and cloud deployment teams

Build and grow the team

  • Develop, retain, and hire a strong team that can operate at OS and framework levels when required
  • Coach engineers through shifting priorities driven by model launches, hardware changes, and scaling demands
  • Step in to unblock critical deploys or synthesize design debates when necessary

Requirements

  • 5+ years of engineering management experience, ideally with part of that leading critical-path production infrastructure at scale
  • Deep systems background (examples: load balancing, scheduling, cache-coherent distributed state, high-performance networking) sufficient to make architectural calls and evaluate kernel/framework-level work
  • Experience shipping performance improvements in large-scale systems with measurable impact
  • Experience running production infrastructure with operational stakes: on-call, incident response, capacity events, and deploy discipline
  • Results-oriented with a bias toward impact; able to balance throughput, latency, stability, and feature velocity
  • Strong cross-team communication and collaboration skills
  • Curious about machine learning systems; willing to learn transformer inference and its systems implications

Strong candidates may also have

  • Experience with LLM inference serving: KV caching, continuous batching, request scheduling, prefill/decode disaggregation
  • Background in cluster schedulers, load balancers, service meshes, or coordination planes at scale
  • Familiarity with heterogeneous accelerator fleets (GPU/TPU/Trainium) and workload placement trade-offs
  • Experience with GPU/accelerator programming, ML framework internals, or OS-level performance debugging
  • Experience leading teams at supercomputing or hyperscaler infrastructure scale or through rapid growth

Compensation

  • Annual Salary: $405,000 - $485,000 USD

Logistics

  • Education: At least a Bachelor's degree in a related field or equivalent experience
  • Location: San Francisco, CA or New York City, NY (location-based hybrid policy: staff expected to be in an office at least 25% of the time)
  • Visa sponsorship: Anthropic states they sponsor visas and retain an immigration lawyer to assist where possible

Benefits

  • Competitive compensation and benefits
  • Optional equity donation matching
  • Generous vacation and parental leave
  • Flexible working hours and an office space for collaboration

How we're different

Anthropic emphasizes large-scale collaborative AI research, communication skills, and impact-driven work. They encourage applicants who may not meet every listed qualification to apply and highlight diversity and inclusion in hiring.