Staff / Senior Software Engineer, Inference

USD 300,000-485,000 per year
SENIOR
✅ Hybrid
✅ Visa Sponsorship

Used Tools & Technologies

Not specified

Required Skills & Competences

Kubernetes @ 4 Python @ 6 GCP @ 4 Algorithms @ 4 Distributed Systems @ 4 Machine Learning @ 4 AWS @ 4 Azure @ 4 Communication @ 4 Rust @ 6 LLM @ 7 Observability @ 4 AI @ 4

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The Inference team builds and maintains the systems that serve Claude to millions of users worldwide, operating a compute-agnostic inference stack from intelligent request routing to fleet-wide orchestration across diverse AI accelerators and cloud platforms.

Responsibilities

  • Build and maintain high-performance, large-scale distributed systems for model serving and inference.
  • Maximize compute efficiency across heterogeneous accelerator fleets and cloud platforms.
  • Design and implement intelligent routing, load balancing, request routing, and traffic management to optimize request distribution.
  • Autoscale compute fleets and manage production-grade deployment pipelines for releasing models to millions of users.
  • Integrate new AI accelerator platforms and support inference for new model architectures.
  • Implement inference optimizations such as batching, caching, and structured sampling.
  • Analyze observability and production telemetry to tune performance and reliability.
  • Support multi-region deployments and geographic routing for global customers.

Requirements

  • Significant software engineering experience, particularly with distributed systems.
  • Experience implementing and deploying machine learning systems at scale.
  • Experience or strong interest in LLM inference optimization, batching, and caching strategies.
  • Familiarity with load balancing, request routing, traffic management, and autoscaling.
  • Experience with Kubernetes and cloud infrastructure (AWS, GCP, Azure).
  • Proficiency in Python or Rust.
  • Results-oriented, collaborative (pair programming), and interested in ML systems and infrastructure.
  • Education: at least a Bachelor's degree in a related field or equivalent experience.

Representative projects

  • Designing intelligent routing algorithms across thousands of accelerators.
  • Autoscaling compute fleets to match supply with demand across production and research workloads.
  • Building production deployment pipelines for large-scale model releases.
  • Integrating new AI accelerator platforms and contributing inference features (e.g., prompt caching, structured sampling).
  • Supporting inference for new model architectures and managing multi-region deployments.

Compensation

  • Annual salary range: $300,000 - $485,000 USD.

Logistics

  • Locations: San Francisco, CA; New York City, NY; Seattle, WA (United States).
  • Location-based hybrid policy: staff are expected to be in one of Anthropic's offices at least 25% of the time.
  • Visa sponsorship: Anthropic states they do sponsor visas and will make reasonable efforts and retain an immigration lawyer to assist, though not all roles/candidates can be successfully sponsored.
  • Applications are reviewed on a rolling basis.

How Anthropic describes culture and benefits

  • Collaborative research-driven environment; emphasis on communication and big-science research.
  • Competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and office space for collaboration.