Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Algorithms @ 3
Distributed Systems @ 3
Machine Learning @ 3
Communication @ 6
Debugging @ 3
GPU @ 3
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.
As a Performance Engineer, you'll be responsible for identifying systems-level problems when running machine learning algorithms at scale and developing systems that optimize the throughput and robustness of our largest distributed systems. Strong candidates will have a track record of solving large-scale systems problems and will be excited to grow to become an expert in ML.
Responsibilities
- Identify and solve novel systems problems that arise when running ML at scale.
- Implement low-latency, high-throughput sampling for large language models.
- Implement GPU kernels and adapt models for low-precision inference.
- Design and implement custom load-balancing algorithms to optimize serving efficiency.
- Build quantitative models of system performance.
- Design and implement fault-tolerant distributed systems operating with complex network topologies.
- Debug kernel-level network latency spikes in containerized environments.
- Pair program and collaborate closely with researchers and engineers.
Requirements
- Significant software engineering or machine learning experience, particularly at supercomputing scale.
- Track record of solving large-scale systems problems.
- Results-oriented with a bias toward flexibility and impact.
- Willingness to pick up work outside a narrow job description.
- Enjoy pair programming and strong communication skills.
- Education: at least a Bachelor's degree in a related field or equivalent experience.
Strong candidates may also have experience with:
- High performance, large-scale ML systems
- GPU / accelerator programming
- ML framework internals
- OS internals
- Language modeling with transformers
Representative projects
- Low-latency, high-throughput sampling for LLMs
- Implementing GPU kernels for low-precision inference
- Custom load-balancing algorithms for serving efficiency
- Building quantitative performance models
- Fault-tolerant distributed system design and implementation
- Debugging kernel-level network latency in containers
Compensation and Benefits
- Annual base salary: $315,000 - $560,000 USD (total compensation includes equity, benefits, and may include incentive compensation).
- Competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and collaborative office spaces.
Logistics
- Locations: San Francisco, CA; New York City, NY; Seattle, WA.
- Location-based hybrid policy: staff expected to be in one of our offices at least 25% of the time (some roles may require more office time).
- Visa sponsorship: Anthropic does sponsor visas, though not every role/candidate can always be sponsored. If an offer is made, they will make reasonable efforts and retain an immigration lawyer to help.
- Deadline to apply: None (applications reviewed on a rolling basis).
How we're different
- Work as a single cohesive team on a few large-scale research efforts.
- Value impact, collaboration, and communication.
- Research directions include work related to GPT-3, interpretability, scaling laws, AI & compute, and learning from human preferences.
How to apply
- Candidates are encouraged to apply even if they do not meet every qualification. Anthropic values diverse perspectives and recognizes that strong candidates may not meet every listed requirement.