Staff+ Software Engineer, Inference Runtime

at Anthropic

📍 New York City, United States
📍 San Francisco, United States
📍 Seattle, United States

USD 405,000-485,000 per year

SENIOR

✅ Hybrid

✅ Visa Sponsorship

Used Tools & Technologies

Machine Learning

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Kubernetes @ 3 Python @ 4 CI/CD @ 4 Distributed Systems @ 4 AWS @ 4 Communication @ 7 Prioritization @ 4 Rust @ 4 Debugging @ 4 CUDA @ 4 GPU @ 4 AI @ 4 Profiling @ 4

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The Inference organization serves Claude to millions of users and enterprise customers with the speed, reliability, and efficiency that frontier AI demands. This role is a technical lead for Inference Runtime: the team that owns the shared, accelerator-agnostic core of Anthropic's inference serving stack.

About the role

This is a senior individual-contributor role with broad technical ownership. You will set technical direction for the runtime's architecture, its release and validation systems, and the workflows engineers use to develop on top of it. You will partner across Inference and with central Infrastructure to make decisions about boundaries, prioritization, and tradeoffs across heterogeneous accelerator platforms (GPU, TPU, Trainium). You will work hands-on in a performance-sensitive Rust and Python codebase and mentor engineers on the team.

Responsibilities

Set technical direction for the team, owning the architecture and roadmap for the shared runtime of the inference serving stack
Own and evolve the accelerator-agnostic runtime: interfaces, internal boundaries, and build structure, including hands-on work in a performance-sensitive Rust and Python codebase
Ensure platform expansion costs remain low by isolating specialization and keeping edge cases integrated with the core
Drive efficient accelerator usage: utilization, scheduling, and memory management across GPU, TPU, and Trainium
Build the runtime's validation surface around partitioned builds, change-scoped testing, and canary/shadow/rollback mechanisms
Act as a technical counterpart to central Infrastructure on compilers, build systems, and toolchains; decide when to build vs. adopt
Mentor engineers through design and code review and direct collaboration

Minimum qualifications

Deep background in systems engineering or ML infrastructure, able to perform performance profiling, latency and throughput optimization, and systems debugging at scale
Real depth in at least one accelerator ecosystem (CUDA/GPU, TPU, or Trainium/AWS Neuron) and desire to keep the runtime agnostic across them
Significant software engineering experience with high-performance, large-scale distributed systems serving millions of users
Experience defining and using engineering metrics (SLOs) to drive measurable improvements in escape rates, release times, latency, or throughput
Experience driving technical alignment across organizational boundaries and strong written and verbal communication skills

Preferred qualifications

8+ years of software engineering experience, with significant time as a technical lead or anchor on a platform, inference runtime, or ML infrastructure team
Experience with ML compiler toolchains (XLA, Triton, NeuronX) or accelerator driver/firmware management at scale
Background operating production as a validation surface at scale: shadow traffic, canary populations, automated baseline comparison, fast rollback
Experience with deterministic or simulation-based testing for hardware-dependent systems
Experience with CI/CD systems at scale for accelerator workloads
Familiarity with Kubernetes-based development and job scheduling environments

Compensation

Annual Salary: $405,000 - $485,000 USD

Logistics

Locations: Remote-friendly; San Francisco, CA; Seattle, WA; New York City, NY
Minimum education: Bachelor's degree or equivalent experience
Location-based hybrid policy: staff expected to be in one of the offices at least ~25% of the time (hybrid)
Visa sponsorship: Anthropic states they do sponsor visas and will make reasonable efforts to assist with immigration if an offer is made

Technologies and topics mentioned

Rust, Python, GPUs, CUDA, TPU, Trainium, AWS Neuron, ML compiler toolchains (XLA, Triton, NeuronX), performance profiling, latency and throughput optimization, distributed systems, SLOs/metrics, CI/CD, Kubernetes, build systems, canary/shadow/rollback, validation/testing strategies