Performance Engineer

USD 315,000-560,000 per year
MIDDLE
✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Algorithms @ 3 Distributed Systems @ 3 Machine Learning @ 3 Communication @ 6 Debugging @ 3 GPU @ 3

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.

As a Performance Engineer, you'll be responsible for identifying systems-level problems when running machine learning algorithms at scale and developing systems that optimize the throughput and robustness of our largest distributed systems. Strong candidates will have a track record of solving large-scale systems problems and will be excited to grow to become an expert in ML.

Responsibilities

  • Identify and solve novel systems problems that arise when running ML at scale.
  • Implement low-latency, high-throughput sampling for large language models.
  • Implement GPU kernels and adapt models for low-precision inference.
  • Design and implement custom load-balancing algorithms to optimize serving efficiency.
  • Build quantitative models of system performance.
  • Design and implement fault-tolerant distributed systems operating with complex network topologies.
  • Debug kernel-level network latency spikes in containerized environments.
  • Pair program and collaborate closely with researchers and engineers.

Requirements

  • Significant software engineering or machine learning experience, particularly at supercomputing scale.
  • Track record of solving large-scale systems problems.
  • Results-oriented with a bias toward flexibility and impact.
  • Willingness to pick up work outside a narrow job description.
  • Enjoy pair programming and strong communication skills.
  • Education: at least a Bachelor's degree in a related field or equivalent experience.

Strong candidates may also have experience with:

  • High performance, large-scale ML systems
  • GPU / accelerator programming
  • ML framework internals
  • OS internals
  • Language modeling with transformers

Representative projects

  • Low-latency, high-throughput sampling for LLMs
  • Implementing GPU kernels for low-precision inference
  • Custom load-balancing algorithms for serving efficiency
  • Building quantitative performance models
  • Fault-tolerant distributed system design and implementation
  • Debugging kernel-level network latency in containers

Compensation and Benefits

  • Annual base salary: $315,000 - $560,000 USD (total compensation includes equity, benefits, and may include incentive compensation).
  • Competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and collaborative office spaces.

Logistics

  • Locations: San Francisco, CA; New York City, NY; Seattle, WA.
  • Location-based hybrid policy: staff expected to be in one of our offices at least 25% of the time (some roles may require more office time).
  • Visa sponsorship: Anthropic does sponsor visas, though not every role/candidate can always be sponsored. If an offer is made, they will make reasonable efforts and retain an immigration lawyer to help.
  • Deadline to apply: None (applications reviewed on a rolling basis).

How we're different

  • Work as a single cohesive team on a few large-scale research efforts.
  • Value impact, collaboration, and communication.
  • Research directions include work related to GPT-3, interpretability, scaling laws, AI & compute, and learning from human preferences.

How to apply

  • Candidates are encouraged to apply even if they do not meet every qualification. Anthropic values diverse perspectives and recognizes that strong candidates may not meet every listed requirement.