TPU Kernel Engineer

USD 280,000-560,000 per year
MIDDLE
βœ… Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Algorithms @ 3 Communication @ 3 Debugging @ 3

Details

Anthropic is building reliable, interpretable, and steerable AI systems and seeks a TPU Kernel Engineer to identify and address performance issues across research, training, and inference ML systems. A significant portion of the role involves designing and optimizing kernels for the TPU and providing feedback to researchers about how model changes impact performance. The role values experience solving large-scale systems problems and low-level optimization, collaboration (including pair programming), and attention to societal impacts of AI work.

Responsibilities

  • Identify and address performance issues across ML systems used for research, training, and inference.
  • Design and implement optimized kernels for TPUs (and other accelerators where relevant).
  • Provide feedback to researchers about model changes and their performance impact.
  • Implement low-latency, high-throughput sampling for large language models.
  • Adapt existing models for low-precision inference.
  • Build quantitative models of system performance.
  • Design and implement custom collective communication algorithms.
  • Debug kernel performance at the assembly level.
  • Collaborate closely with researchers and engineers, including pair programming.

Requirements

  • Significant experience optimizing ML systems for TPUs, GPUs, or other accelerators.
  • Track record of solving large-scale systems and low-level optimization problems.
  • Experience designing and implementing kernels for TPUs or other ML accelerators.
  • Deep understanding of accelerators (for example, background in computer architecture).
  • Familiarity with ML framework internals.
  • Experience or familiarity with language modeling using transformers.
  • Ability to build quantitative performance models and debug at low levels (including assembly-level performance debugging).
  • At least a Bachelor's degree in a related field or equivalent experience.

Representative projects (examples of work you may do)

  • Implement low-latency, high-throughput sampling for large language models.
  • Adapt models for low-precision inference.
  • Build quantitative models of system performance.
  • Design and implement custom collective communication algorithms.
  • Debug kernel performance at the assembly level.

Logistics & Other Details

  • Locations: San Francisco, CA; New York City, NY; Seattle, WA.
  • Location-based hybrid policy: staff are expected to be in an office at least 25% of the time (some roles may require more time in office).
  • Visa sponsorship: Anthropic does sponsor visas and retains an immigration lawyer; sponsorship is subject to role and candidate specifics.
  • Encouragement to apply even if you do not meet every qualification; Anthropic highlights diversity and inclusion.

Benefits

  • Competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and collaborative office spaces.
  • Guidance on candidates' AI usage is provided for the application process.