Used Tools & Technologies
Not specified
Required Skills & Competences ?
Algorithms @ 3 Communication @ 3 Debugging @ 3Details
Anthropic is building reliable, interpretable, and steerable AI systems and seeks a TPU Kernel Engineer to identify and address performance issues across research, training, and inference ML systems. A significant portion of the role involves designing and optimizing kernels for the TPU and providing feedback to researchers about how model changes impact performance. The role values experience solving large-scale systems problems and low-level optimization, collaboration (including pair programming), and attention to societal impacts of AI work.
Responsibilities
- Identify and address performance issues across ML systems used for research, training, and inference.
- Design and implement optimized kernels for TPUs (and other accelerators where relevant).
- Provide feedback to researchers about model changes and their performance impact.
- Implement low-latency, high-throughput sampling for large language models.
- Adapt existing models for low-precision inference.
- Build quantitative models of system performance.
- Design and implement custom collective communication algorithms.
- Debug kernel performance at the assembly level.
- Collaborate closely with researchers and engineers, including pair programming.
Requirements
- Significant experience optimizing ML systems for TPUs, GPUs, or other accelerators.
- Track record of solving large-scale systems and low-level optimization problems.
- Experience designing and implementing kernels for TPUs or other ML accelerators.
- Deep understanding of accelerators (for example, background in computer architecture).
- Familiarity with ML framework internals.
- Experience or familiarity with language modeling using transformers.
- Ability to build quantitative performance models and debug at low levels (including assembly-level performance debugging).
- At least a Bachelor's degree in a related field or equivalent experience.
Representative projects (examples of work you may do)
- Implement low-latency, high-throughput sampling for large language models.
- Adapt models for low-precision inference.
- Build quantitative models of system performance.
- Design and implement custom collective communication algorithms.
- Debug kernel performance at the assembly level.
Logistics & Other Details
- Locations: San Francisco, CA; New York City, NY; Seattle, WA.
- Location-based hybrid policy: staff are expected to be in an office at least 25% of the time (some roles may require more time in office).
- Visa sponsorship: Anthropic does sponsor visas and retains an immigration lawyer; sponsorship is subject to role and candidate specifics.
- Encouragement to apply even if you do not meet every qualification; Anthropic highlights diversity and inclusion.
Benefits
- Competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and collaborative office spaces.
- Guidance on candidates' AI usage is provided for the application process.