Used Tools & Technologies
Machine LearningRequired Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Kubernetes @ 3
Python @ 4
CI/CD @ 4
Distributed Systems @ 4
AWS @ 4
Communication @ 7
Prioritization @ 4
Rust @ 4
Debugging @ 4
CUDA @ 4
GPU @ 4
AI @ 4
Profiling @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The Inference organization serves Claude to millions of users and enterprise customers with the speed, reliability, and efficiency that frontier AI demands. This role is a technical lead for Inference Runtime: the team that owns the shared, accelerator-agnostic core of Anthropic's inference serving stack.
About the role
This is a senior individual-contributor role with broad technical ownership. You will set technical direction for the runtime's architecture, its release and validation systems, and the workflows engineers use to develop on top of it. You will partner across Inference and with central Infrastructure to make decisions about boundaries, prioritization, and tradeoffs across heterogeneous accelerator platforms (GPU, TPU, Trainium). You will work hands-on in a performance-sensitive Rust and Python codebase and mentor engineers on the team.
Responsibilities
- Set technical direction for the team, owning the architecture and roadmap for the shared runtime of the inference serving stack
- Own and evolve the accelerator-agnostic runtime: interfaces, internal boundaries, and build structure, including hands-on work in a performance-sensitive Rust and Python codebase
- Ensure platform expansion costs remain low by isolating specialization and keeping edge cases integrated with the core
- Drive efficient accelerator usage: utilization, scheduling, and memory management across GPU, TPU, and Trainium
- Build the runtime's validation surface around partitioned builds, change-scoped testing, and canary/shadow/rollback mechanisms
- Act as a technical counterpart to central Infrastructure on compilers, build systems, and toolchains; decide when to build vs. adopt
- Mentor engineers through design and code review and direct collaboration
Minimum qualifications
- Deep background in systems engineering or ML infrastructure, able to perform performance profiling, latency and throughput optimization, and systems debugging at scale
- Real depth in at least one accelerator ecosystem (CUDA/GPU, TPU, or Trainium/AWS Neuron) and desire to keep the runtime agnostic across them
- Significant software engineering experience with high-performance, large-scale distributed systems serving millions of users
- Experience defining and using engineering metrics (SLOs) to drive measurable improvements in escape rates, release times, latency, or throughput
- Experience driving technical alignment across organizational boundaries and strong written and verbal communication skills
Preferred qualifications
- 8+ years of software engineering experience, with significant time as a technical lead or anchor on a platform, inference runtime, or ML infrastructure team
- Experience with ML compiler toolchains (XLA, Triton, NeuronX) or accelerator driver/firmware management at scale
- Background operating production as a validation surface at scale: shadow traffic, canary populations, automated baseline comparison, fast rollback
- Experience with deterministic or simulation-based testing for hardware-dependent systems
- Experience with CI/CD systems at scale for accelerator workloads
- Familiarity with Kubernetes-based development and job scheduling environments
Compensation
Annual Salary: $405,000 - $485,000 USD
Logistics
- Locations: Remote-friendly; San Francisco, CA; Seattle, WA; New York City, NY
- Minimum education: Bachelor's degree or equivalent experience
- Location-based hybrid policy: staff expected to be in one of the offices at least ~25% of the time (hybrid)
- Visa sponsorship: Anthropic states they do sponsor visas and will make reasonable efforts to assist with immigration if an offer is made
Technologies and topics mentioned
Rust, Python, GPUs, CUDA, TPU, Trainium, AWS Neuron, ML compiler toolchains (XLA, Triton, NeuronX), performance profiling, latency and throughput optimization, distributed systems, SLOs/metrics, CI/CD, Kubernetes, build systems, canary/shadow/rollback, validation/testing strategies