Software Engineer, Compute Efficiency

USD 320,000-405,000 per year
MIDDLE
✅ Hybrid
✅ Visa Sponsorship

Used Tools & Technologies

Not specified

Required Skills & Competences

Go @ 6 Kubernetes @ 3 Linux @ 3 Python @ 6 GCP @ 3 Java @ 6 Distributed Systems @ 3 Machine Learning @ 3 AWS @ 3 Communication @ 3 Networking @ 3 Performance Optimization @ 3 Rust @ 6 AI @ 3 NCCL @ 3

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. As infrastructure scales rapidly, the Capacity team focuses on optimizing performance, cost, and sustainability without compromising reliability or latency. This role works across the full infrastructure stack—from cloud platforms and networking to application-level performance—and bridges research needs with low-level hardware constraints.

Responsibilities

  • Build and evolve telemetry and monitoring systems to provide visibility into infrastructure performance, utilization, and costs across cloud and datacenter fleets.
  • Design and implement cost attribution frameworks for multi-tenant infrastructure to enable teams to understand and optimize resource consumption.
  • Identify and resolve performance bottlenecks and capacity hotspots through deep analysis of distributed systems at scale.
  • Partner with cloud service providers and internal stakeholders to optimize cluster configurations, workload placement, and resource utilization for AI training and inference workloads (including very large clusters).
  • Develop and champion engineering practices around efficiency, driving a culture of performance awareness and cost-conscious design.
  • Collaborate with research and product teams to understand infrastructure needs and design solutions that balance performance with cost efficiency.
  • Drive architectural improvements and code-level optimizations across services and platforms to deliver measurable utilization and performance gains.

Requirements

  • 6+ years of relevant industry experience, with 1+ year leading large-scale, complex projects or teams as a software engineer or tech lead.
  • Deep expertise in distributed systems at scale, with a strong focus on infrastructure reliability, scalability, and continuous improvement.
  • Strong proficiency in at least one programming language (examples given: Python, Rust, Go, Java).
  • Hands-on experience with cloud infrastructure, including Kubernetes, Infrastructure as Code, and major cloud providers such as AWS or GCP.
  • Experience optimizing end-to-end performance of distributed systems, including workload right-sizing and resource utilization tuning.
  • Experience designing or working with performance and utilization monitoring tools in large-scale, distributed environments.
  • Strong problem-solving skills, ability to work independently, and navigate ambiguity.
  • Excellent communication and collaboration skills to work closely with internal and external stakeholders.

Strong candidates may have

  • Experience with machine learning infrastructure workloads and associated networking technologies like NCCL.
  • Low-level systems experience (e.g., Linux kernel tuning, eBPF).
  • Published work in performance optimization and scaling distributed systems.

Annual Salary

  • $320,000 - $405,000 USD

Logistics

  • Education requirements: At least a Bachelor's degree in a related field or equivalent experience.
  • Location-based hybrid policy: Staff are expected to be in one of Anthropic's offices at least 25% of the time; some roles may require more time in offices.
  • Visa sponsorship: Anthropic states they do sponsor visas and retain an immigration lawyer to assist, though sponsorship success may vary by role and candidate.

Benefits

Anthropic offers competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and office collaboration spaces.

How we're different

Anthropic emphasizes large-scale, high-impact AI research as a cohesive team, values communication, and pursues directions such as interpretability, scaling laws, and AI safety. Applicants are encouraged to apply even if they do not meet every listed qualification.