Used Tools & Technologies
Not specified
Required Skills & Competences ?
Algorithms @ 3 Distributed Systems @ 3 Hiring @ 3 Communication @ 5 Performance Optimization @ 6 PyTorch @ 3 CUDA @ 3 GPU @ 3Details
Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The team builds large-scale, safe AI systems and is hiring a GPU Performance Engineer to maximize GPU utilization and performance for Claude and other large language models.
Role overview
Pioneering the next generation of AI requires breakthrough innovations in GPU performance and systems engineering. As a GPU Performance Engineer you will architect and implement foundational systems that power large language models. Work spans the full stack — from low-level tensor core and kernel optimizations to designing distributed system architectures that orchestrate thousands of GPUs.
You will implement techniques ranging from custom kernel development to distributed communication strategies, profile and eliminate production performance bottlenecks, and partner with hardware vendors to influence accelerator capabilities.
Responsibilities
- Maximize GPU utilization and performance for training and inference at large scale
- Develop and implement custom kernels and kernel fusion strategies
- Optimize tensor core usage and memory bandwidth to reduce bottlenecks
- Profile production systems and eliminate performance bottlenecks (e.g., using Nsight)
- Design distributed communication and orchestration strategies for multi-node GPU clusters
- Build performance modeling frameworks to predict and optimize GPU utilization
- Ensure resilience and fault tolerance for large-scale training and serving infrastructure
- Collaborate with researchers, engineers, and hardware vendors to co-design algorithms and hardware-aware optimizations
Requirements
- At least a Bachelor's degree in a related field or equivalent experience
- Deep experience with GPU programming and performance optimization at scale
- Strong familiarity with GPU kernel development and low-level optimizations
- Experience across the stack: hardware interfaces, ML frameworks, and distributed systems
- Experience or familiarity with profiling, performance modeling, and production ML systems
- Ability to work collaboratively in ambiguous, research-driven environments
Strong candidates may also have experience with:
- GPU Kernel Development: CUDA, Triton, CUTLASS, Flash Attention, tensor core optimization
- ML compilers & frameworks: PyTorch/JAX internals, torch.compile, XLA, custom operators
- Performance engineering: kernel fusion, memory bandwidth optimization, profiling with Nsight
- Distributed systems: NCCL, NVLink, collective communication, model parallelism
- Low-precision and quantization: INT8/FP8 quantization, mixed-precision techniques
- Production systems: large-scale training infrastructure, fault tolerance, cluster orchestration
Representative projects
- Co-design attention mechanisms and algorithms for next-generation hardware
- Develop custom kernels for emerging quantization formats and mixed-precision techniques
- Design distributed communication strategies for multi-node GPU clusters
- Optimize end-to-end training and inference pipelines for frontier language models
- Build performance modeling frameworks to predict and optimize GPU utilization
- Implement kernel fusion strategies to minimize memory bandwidth bottlenecks
- Create resilient systems for planet-scale distributed training infrastructure
- Profile and eliminate performance bottlenecks in production serving infrastructure
- Partner with hardware vendors to influence future accelerator capabilities and software stacks
Logistics
- Location-based hybrid policy: staff are expected to be in an office at least 25% of the time (may vary by role)
- Visa sponsorship: Anthropic does sponsor visas and retains immigration counsel; sponsorship availability varies by role
- Education: at least a Bachelor's degree in a related field or equivalent experience
- Deadline to apply: None (applications reviewed on a rolling basis)
Benefits and culture
- Competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and offices for collaboration
- Collaborative, research-driven environment valuing high-impact work and communication
- Encouragement to apply even if you don't meet every qualification