Performance Engineer

at Anthropic

📍 New York City, United States
📍 San Francisco, United States
📍 Seattle, United States

USD 315,000-560,000 per year

MIDDLE

✅ Hybrid

Used Tools & Technologies

Not specified

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Algorithms @ 3 Distributed Systems @ 3 Machine Learning @ 3 Communication @ 6 Debugging @ 3 GPU @ 3

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.

As a Performance Engineer, you'll be responsible for identifying systems-level problems when running machine learning algorithms at scale and developing systems that optimize the throughput and robustness of our largest distributed systems. Strong candidates will have a track record of solving large-scale systems problems and will be excited to grow to become an expert in ML.

Responsibilities

Identify and solve novel systems problems that arise when running ML at scale.
Implement low-latency, high-throughput sampling for large language models.
Implement GPU kernels and adapt models for low-precision inference.
Design and implement custom load-balancing algorithms to optimize serving efficiency.
Build quantitative models of system performance.
Design and implement fault-tolerant distributed systems operating with complex network topologies.
Debug kernel-level network latency spikes in containerized environments.
Pair program and collaborate closely with researchers and engineers.

Requirements

Significant software engineering or machine learning experience, particularly at supercomputing scale.
Track record of solving large-scale systems problems.
Results-oriented with a bias toward flexibility and impact.
Willingness to pick up work outside a narrow job description.
Enjoy pair programming and strong communication skills.
Education: at least a Bachelor's degree in a related field or equivalent experience.

Strong candidates may also have experience with:

High performance, large-scale ML systems
GPU / accelerator programming
ML framework internals
OS internals
Language modeling with transformers

Representative projects

Low-latency, high-throughput sampling for LLMs
Implementing GPU kernels for low-precision inference
Custom load-balancing algorithms for serving efficiency
Building quantitative performance models
Fault-tolerant distributed system design and implementation
Debugging kernel-level network latency in containers

Compensation and Benefits

Annual base salary: $315,000 - $560,000 USD (total compensation includes equity, benefits, and may include incentive compensation).
Competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and collaborative office spaces.

Logistics

Locations: San Francisco, CA; New York City, NY; Seattle, WA.
Location-based hybrid policy: staff expected to be in one of our offices at least 25% of the time (some roles may require more office time).
Visa sponsorship: Anthropic does sponsor visas, though not every role/candidate can always be sponsored. If an offer is made, they will make reasonable efforts and retain an immigration lawyer to help.
Deadline to apply: None (applications reviewed on a rolling basis).

How we're different

Work as a single cohesive team on a few large-scale research efforts.
Value impact, collaboration, and communication.
Research directions include work related to GPT-3, interpretability, scaling laws, AI & compute, and learning from human preferences.

How to apply

Candidates are encouraged to apply even if they do not meet every qualification. Anthropic values diverse perspectives and recognizes that strong candidates may not meet every listed requirement.