Used Tools & Technologies
Not specified
Required Skills & Competences ?
Security @ 4 Grafana @ 3 Kubernetes @ 4 Prometheus @ 3 Terraform @ 4 Python @ 4 Hiring @ 4 Communication @ 4 FastAPI @ 4 Mentoring @ 4 API @ 7 OpenTelemetry @ 4Details
Groq delivers fast, efficient AI inference. Our LPU-based system powers GroqCloud™, giving businesses and developers the speed and scale they need. From our Bay Area roots to our growing global presence, we are on a mission to make high performance AI compute more accessible and affordable.
Mission
As a member of the Groq Backbone team, your mission is to scale GroqCloud's global footprint — delivering a consistent, high-performance experience with maximum availability to users everywhere.
Responsibilities
- Backbone architecture & design for inter-DC transport networks.
- Partner with regional network teams to harmonize routing, peering, and policy across sites.
- Serve as primary liaison for external service providers (ISPs, dark-fiber carriers) and coordinate joint troubleshooting.
- Monitor and optimize transport performance (link utilization, packet loss, jitter, latency) using telemetry such as NetFlow, sFlow, and OpenTelemetry.
- Build and maintain GitOps-driven pipelines for backbone configuration and lifecycle management using Buildkite, Terraform, Python, and Kubernetes.
- Transport optimization and performance tuning for global networks.
- Coordinate with cross-geo teams and mentor others.
Requirements / Ideal Candidate
- 7+ years designing, deploying, and operating inter-DC transport networks.
- Deep backbone expertise: experience with ISIS, MPLS, Segment Routing (SR), Traffic Engineering (TE), L3VPN, Anycast, dark fiber, and wave/optical platforms.
- Routing proficiency: BGP (including full internet routing tables, IXs), ISIS, SR, TE; experience with route reflectors and BGP communities.
- Strong automation & scripting experience: Terraform/CDKTF, Python, vendor APIs, and prior automation of multiple networks.
- Experience with GitOps-driven configuration management and pipelines (Buildkite, Terraform, Kubernetes).
- Cloud-native application development and delivery experience, including packaging/testing automation as RESTful Python applications (FastAPI).
- Familiarity with monitoring & telemetry: NetFlow, sFlow, Prometheus ecosystem, Grafana, and OpenTelemetry for end-to-end visibility.
- Security knowledge: IPsec, MACsec, ACLs, and DDoS mitigation in a global network context.
- Excellent analytical, communication, and collaboration skills; comfortable leading cross-geo teams and mentoring others.
Compensation
- USA base salary range: $203,200 to $239,100 (base salary; total compensation includes equity and benefits). Compensation outside the USA will depend on the local market.
Location & Work Model
- If located near Groq offices in Palo Alto, California or Toronto, Canada, you might be asked to work hybrid.
- The listing includes hybrid, remote, and onsite hiring indicators (#LI-Remote, #LI-Hybrid, #LI-Onsite).
Why Join Us / Benefits
- Purposeful hiring and intentional team composition.
- Opportunity to build and shape company trajectory.
- Mission-driven work tackling challenging problems.
- High-performance culture and standards.
Equal Opportunity / Misc
- Groq is an Equal Opportunity Employer and committed to providing reasonable accommodations for applicants with disabilities. Accommodation requests can be sent to [email protected].
- All offers contingent upon verification of identity and employment authorization.