Staff + Senior Software Engineer, Inference Deployment

USD 320,000-485,000 per year
SENIOR
✅ Hybrid
✅ Visa Sponsorship

Used Tools & Technologies

Machine Learning

Required Skills & Competences

Kubernetes @ 6 Python @ 4 Communication @ 7 Rust @ 4 GPU @ 4 Observability @ 4 AI @ 4

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The Launch Engineering team makes inference deployment boring and unattended by designing and building deployment infrastructure that moves inference code from merge to production across resource-constrained accelerator fleets (GPU, TPU, Trainium). This role focuses on orchestration, capacity-aware scheduling, observability, and pipeline architectures that reduce cycle time and minimize disruption to serving capacity.

Responsibilities

  • Own deployment orchestration that continuously moves validated inference builds into production across GPU, TPU, and Trainium fleets, unattended under normal conditions
  • Improve capacity-aware deployment scheduling to maximize deployment throughput against constrained accelerator budgets and variable fleet sizes
  • Extend deployment observability — dashboards and tooling that answer "what code is running in production," "where is my commit," and "what validation passed for this deploy"
  • Drive down cycle time from code merge to production with pipeline architectures that minimize serial dependencies and maximize parallelism
  • Optimize fleet rollout strategies for large-scale deployments across thousands of accelerator chips, minimizing disruption to serving capacity
  • Evolve self-service model onboarding so new models can be added to the continuous deployment pipeline without Launch Engineering involvement
  • Partner across the Inference organization with teams owning validation, autoscaling, and model routing to integrate deployment automation with their systems

Minimum qualifications

  • Strong software engineering skills, including experience designing systems that manage complex state machines and multi-stage pipelines
  • Proficiency with Kubernetes-based deployments, rolling update mechanics, and container orchestration
  • Experience building deployment, release, or delivery infrastructure where resource constraints (fleet capacity, network bandwidth, hardware availability, coordinated rollout windows) shape the design
  • A track record of building automation that measurably improves deployment velocity and reliability
  • Comfort working across the stack — from backend services and databases to CLI tools and web UIs
  • Strong communication skills and the ability to work closely with oncall engineers, model teams, and infrastructure partners

Preferred qualifications

  • 5+ years of experience building deployment, release, or delivery infrastructure at scale
  • Experience with Python and/or Rust in production systems
  • Experience with ML inference or training infrastructure deployment, particularly across multiple accelerator types (GPU, TPU, Trainium)
  • Background in capacity planning or resource-constrained scheduling (e.g., bin-packing, fleet management, job scheduling with hardware affinity)
  • Experience with progressive delivery in systems with long validation cycles: canary/soak testing, blue-green deployments, traffic shifting, automated rollback
  • Experience at companies with large-scale release engineering challenges (mobile release trains, monorepo deployments, multi-datacenter rollouts)

Compensation

Annual Salary: $320,000 - $485,000 USD

Logistics

  • Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience
  • Location-based hybrid policy: currently, staff are expected to be in one of the offices at least 25% of the time (some roles may require more time in office)
  • Visa sponsorship: Anthropic states they sponsor visas and retain an immigration lawyer to assist when they make an offer

How we're different

Anthropic works as a cohesive team on a few large-scale research efforts, values communication, and emphasizes impact on steerable, trustworthy AI. The team is collaborative and frequently hosts research discussions.

Application notes

The posting encourages applicants who may not meet every qualification to apply and includes candidate guidance about AI usage in the application process.