Staff + Senior Software Engineer, Inference Deployment

at Anthropic

📍 New York City, United States
📍 San Francisco, United States
📍 Seattle, United States

USD 320,000-485,000 per year

SENIOR

✅ Hybrid

✅ Visa Sponsorship

Used Tools & Technologies

Machine Learning

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Kubernetes @ 6 Python @ 4 Communication @ 7 Rust @ 4 GPU @ 4 Observability @ 4 AI @ 4

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The Launch Engineering team makes inference deployment boring and unattended by designing and building deployment infrastructure that moves inference code from merge to production across resource-constrained accelerator fleets (GPU, TPU, Trainium). This role focuses on orchestration, capacity-aware scheduling, observability, and pipeline architectures that reduce cycle time and minimize disruption to serving capacity.

Responsibilities

Own deployment orchestration that continuously moves validated inference builds into production across GPU, TPU, and Trainium fleets, unattended under normal conditions
Improve capacity-aware deployment scheduling to maximize deployment throughput against constrained accelerator budgets and variable fleet sizes
Extend deployment observability — dashboards and tooling that answer "what code is running in production," "where is my commit," and "what validation passed for this deploy"
Drive down cycle time from code merge to production with pipeline architectures that minimize serial dependencies and maximize parallelism
Optimize fleet rollout strategies for large-scale deployments across thousands of accelerator chips, minimizing disruption to serving capacity
Evolve self-service model onboarding so new models can be added to the continuous deployment pipeline without Launch Engineering involvement
Partner across the Inference organization with teams owning validation, autoscaling, and model routing to integrate deployment automation with their systems

Minimum qualifications

Strong software engineering skills, including experience designing systems that manage complex state machines and multi-stage pipelines
Proficiency with Kubernetes-based deployments, rolling update mechanics, and container orchestration
Experience building deployment, release, or delivery infrastructure where resource constraints (fleet capacity, network bandwidth, hardware availability, coordinated rollout windows) shape the design
A track record of building automation that measurably improves deployment velocity and reliability
Comfort working across the stack — from backend services and databases to CLI tools and web UIs
Strong communication skills and the ability to work closely with oncall engineers, model teams, and infrastructure partners

Preferred qualifications

5+ years of experience building deployment, release, or delivery infrastructure at scale
Experience with Python and/or Rust in production systems
Experience with ML inference or training infrastructure deployment, particularly across multiple accelerator types (GPU, TPU, Trainium)
Background in capacity planning or resource-constrained scheduling (e.g., bin-packing, fleet management, job scheduling with hardware affinity)
Experience with progressive delivery in systems with long validation cycles: canary/soak testing, blue-green deployments, traffic shifting, automated rollback
Experience at companies with large-scale release engineering challenges (mobile release trains, monorepo deployments, multi-datacenter rollouts)

Compensation

Annual Salary: $320,000 - $485,000 USD

Logistics

Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience
Location-based hybrid policy: currently, staff are expected to be in one of the offices at least 25% of the time (some roles may require more time in office)
Visa sponsorship: Anthropic states they sponsor visas and retain an immigration lawyer to assist when they make an offer

How we're different

Anthropic works as a cohesive team on a few large-scale research efforts, values communication, and emphasizes impact on steerable, trustworthy AI. The team is collaborative and frequently hosts research discussions.

Application notes

The posting encourages applicants who may not meet every qualification to apply and includes candidate guidance about AI usage in the application process.