Used Tools & Technologies
Not specified
Required Skills & Competences ?
Algorithms @ 3 Distributed Systems @ 3 Machine Learning @ 3 Communication @ 6 Debugging @ 3 GPU @ 3Details
Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.
As a Performance Engineer, you'll be responsible for identifying systems-level problems when running machine learning algorithms at scale and developing systems that optimize the throughput and robustness of our largest distributed systems. Strong candidates will have a track record of solving large-scale systems problems and will be excited to grow to become an expert in ML.
Responsibilities
- Identify and solve novel systems problems that arise when running ML at scale.
- Implement low-latency, high-throughput sampling for large language models.
- Implement GPU kernels and adapt models for low-precision inference.
- Design and implement custom load-balancing algorithms to optimize serving efficiency.
- Build quantitative models of system performance.
- Design and implement fault-tolerant distributed systems operating with complex network topologies.
- Debug kernel-level network latency spikes in containerized environments.
- Pair program and collaborate closely with researchers and engineers.
Requirements
- Significant software engineering or machine learning experience, particularly at supercomputing scale.
- Track record of solving large-scale systems problems.
- Results-oriented with a bias toward flexibility and impact.
- Willingness to pick up work outside a narrow job description.
- Enjoy pair programming and strong communication skills.
- Education: at least a Bachelor's degree in a related field or equivalent experience.
Strong candidates may also have experience with:
- High performance, large-scale ML systems
- GPU / accelerator programming
- ML framework internals
- OS internals
- Language modeling with transformers
Representative projects
- Low-latency, high-throughput sampling for LLMs
- Implementing GPU kernels for low-precision inference
- Custom load-balancing algorithms for serving efficiency
- Building quantitative performance models
- Fault-tolerant distributed system design and implementation
- Debugging kernel-level network latency in containers
Compensation and Benefits
- Annual base salary: $315,000 - $560,000 USD (total compensation includes equity, benefits, and may include incentive compensation).
- Competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and collaborative office spaces.
Logistics
- Locations: San Francisco, CA; New York City, NY; Seattle, WA.
- Location-based hybrid policy: staff expected to be in one of our offices at least 25% of the time (some roles may require more office time).
- Visa sponsorship: Anthropic does sponsor visas, though not every role/candidate can always be sponsored. If an offer is made, they will make reasonable efforts and retain an immigration lawyer to help.
- Deadline to apply: None (applications reviewed on a rolling basis).
How we're different
- Work as a single cohesive team on a few large-scale research efforts.
- Value impact, collaboration, and communication.
- Research directions include work related to GPT-3, interpretability, scaling laws, AI & compute, and learning from human preferences.
How to apply
- Candidates are encouraged to apply even if they do not meet every qualification. Anthropic values diverse perspectives and recognizes that strong candidates may not meet every listed requirement.