TPU Kernel Engineer

at Anthropic

📍 New York City, United States
📍 San Francisco, United States
📍 Seattle, United States

USD 280,000-560,000 per year

MIDDLE

✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Algorithms @ 3 Communication @ 3 Debugging @ 3

Details

Anthropic is building reliable, interpretable, and steerable AI systems and seeks a TPU Kernel Engineer to identify and address performance issues across research, training, and inference ML systems. A significant portion of the role involves designing and optimizing kernels for the TPU and providing feedback to researchers about how model changes impact performance. The role values experience solving large-scale systems problems and low-level optimization, collaboration (including pair programming), and attention to societal impacts of AI work.

Responsibilities

Identify and address performance issues across ML systems used for research, training, and inference.
Design and implement optimized kernels for TPUs (and other accelerators where relevant).
Provide feedback to researchers about model changes and their performance impact.
Implement low-latency, high-throughput sampling for large language models.
Adapt existing models for low-precision inference.
Build quantitative models of system performance.
Design and implement custom collective communication algorithms.
Debug kernel performance at the assembly level.
Collaborate closely with researchers and engineers, including pair programming.

Requirements

Significant experience optimizing ML systems for TPUs, GPUs, or other accelerators.
Track record of solving large-scale systems and low-level optimization problems.
Experience designing and implementing kernels for TPUs or other ML accelerators.
Deep understanding of accelerators (for example, background in computer architecture).
Familiarity with ML framework internals.
Experience or familiarity with language modeling using transformers.
Ability to build quantitative performance models and debug at low levels (including assembly-level performance debugging).
At least a Bachelor's degree in a related field or equivalent experience.

Representative projects (examples of work you may do)

Implement low-latency, high-throughput sampling for large language models.
Adapt models for low-precision inference.
Build quantitative models of system performance.
Design and implement custom collective communication algorithms.
Debug kernel performance at the assembly level.

Logistics & Other Details

Locations: San Francisco, CA; New York City, NY; Seattle, WA.
Location-based hybrid policy: staff are expected to be in an office at least 25% of the time (some roles may require more time in office).
Visa sponsorship: Anthropic does sponsor visas and retains an immigration lawyer; sponsorship is subject to role and candidate specifics.
Encouragement to apply even if you do not meet every qualification; Anthropic highlights diversity and inclusion.

Benefits

Competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and collaborative office spaces.
Guidance on candidates' AI usage is provided for the application process.