Network Engineer, Capacity and Efficiency

USD 320,000-405,000 per year
MIDDLE
✅ Hybrid
✅ Visa Sponsorship

Used Tools & Technologies

Machine Learning

Required Skills & Competences

Go @ 3 Grafana @ 3 Python @ 3 GCP @ 6 AWS @ 6 Communication @ 6 Networking @ 3 SRE @ 3 Observability @ 3 AI @ 3 InfiniBand @ 3 HPC @ 3

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The Capacity & Efficiency team within Compute owns cost, utilization, and attribution for non-accelerator infrastructure — the network, compute, and storage backbone that moves petabytes between training clusters, inference fleets, and object storage across clouds and regions. Anthropic runs a private multi-cloud backbone built from dark fiber, optical transport, and CSP direct-connect products, layered over data center fabrics spanning tens of thousands of hosts.

This hands-on individual contributor role lives at the intersection of systems networking and observability: you will instrument the network, model cost-per-bit, find inefficiencies, and implement changes. You will write code (Python, Go), build dashboards, model capacity, and ship config changes to production routers. The role spans network telemetry and observability, traffic engineering, and cost modeling/attribution.

Responsibilities

  • Build the network observability stack: design and deploy telemetry pipelines (sFlow/IPFIX, gNMI streaming, eBPF host probes) that convert packet counters into per-flow, per-tenant, per-workload cost and utilization data. Own SLIs for backbone and DCN fabric health.
  • Hunt for efficiency: analyze inter-region traffic patterns, identify hot links and stranded capacity, quantify dollar impact, and decide whether to buy capacity or move workloads.
  • Own QoS and traffic engineering: design and operate traffic classification, marking, and shaping across the backbone to protect latency-sensitive inference traffic from bulk transfers.
  • Drive cost attribution: tie network spend (egress, interconnect ports, transit, optical leases) back to teams and workloads to inform capacity planning and workload placement.
  • Influence other teams: present findings to research, finance, and Systems Networking to drive changes based on data.
  • Automate: extend intent-based configuration systems and write tooling that converts efficiency findings into safe, reviewable production changes.

Requirements

  • 5+ years operating large-scale production networks (data center fabrics — spine-leaf/Clos, backbone/WAN, or hyperscaler-adjacent environments).
  • Fluency across the stack: BGP (policy, communities), ECMP, VXLAN/EVPN or equivalent overlays, QoS (DSCP, queuing, shaping), and L1/optical basics (DWDM, coherent, LAGs).
  • Deep knowledge of at least one major CSP networking model (AWS: VPC, TGW, Direct Connect; or GCP: Shared VPC, Interconnect, Cloud Router, Network Connectivity Center) and how overlays interact with physical underlays.
  • Experience building or operating network telemetry at scale: streaming telemetry (gNMI/OpenConfig), flow export (sFlow, IPFIX, NetFlow), or eBPF-based host instrumentation. Able to reason about sampling, cardinality, and storage tradeoffs.
  • Comfortable writing Python or Go for tooling, telemetry pipelines, infrastructure-as-code, and network device automation, and shipping that code to production.
  • Quantitative approach: able to turn counter data into defensible cost models and operational recommendations.
  • Strong communication skills: able to explain technical tradeoffs to finance and network engineering partners.

Strong candidates may also have

  • SRE experience for large-scale network infrastructure: SLOs/SLIs, capacity planning with error budgets, incident response for network-impacting outages.
  • Background building or operating cloud provider networking or interconnect/control-plane products.
  • Familiarity with AI/ML infrastructure traffic patterns (all-reduce, checkpoint/weight transfer, inference serving) and their impact on network behavior.
  • Experience with HPC fabrics (InfiniBand, RoCE v2), job-placement/congestion interactions, or high-radix topologies.
  • Traffic engineering experience for large backbones and judgment about when TE complexity is warranted.
  • Hands-on multi-cloud connectivity experience and understanding of cross-cloud billing models.
  • Experience building cost/chargeback systems or FinOps exposure in large cloud environments.

Representative projects

  • Build a per-flow cost attribution pipeline that attributes every byte of cross-region egress to the originating team and workload.
  • Design QoS policy for the private backbone to prevent bulk checkpoint transfers from starving inference traffic.
  • Model whether to buy an additional 1.6 Tb interconnect tranche or re-route traffic through existing capacity.
  • Instrument DCN fabric utilization with streaming telemetry and build Grafana dashboards as the team’s source of truth.

Compensation

  • Annual Salary: $320,000 - $405,000 USD

Logistics

  • Minimum education: Bachelor’s degree or equivalent combination of education, training, and/or experience.
  • Required field of study: a field relevant to the role as demonstrated through coursework, training, or professional experience.
  • Minimum years of experience: will correlate with internal job level requirements.
  • Location-based hybrid policy: expected to be in one of Anthropic’s offices at least 25% of the time (some roles may require more office time).
  • Visa sponsorship: Anthropic states they do sponsor visas and retain an immigration lawyer to assist, though sponsorship success varies by role and candidate.

Why this role, why now

Anthropic’s network footprint is growing rapidly. The role offers an opportunity to build the measurement and optimization layer from the ground up, with direct budget impact and influence on infrastructure scaling.

Benefits

Anthropic offers competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a collaborative office environment.