Used Tools & Technologies
Not specified
Required Skills & Competences ?
Security @ 4 Kubernetes @ 4 GCP @ 4 GitHub @ 4 GitHub Actions @ 4 CI/CD @ 4 Leadership @ 4 Communication @ 7 Networking @ 4Details
Groq delivers fast, efficient AI inference. Our LPU-based system powers GroqCloud™, giving businesses and developers the speed and scale they need. Headquartered in Silicon Valley, we are on a mission to make high performance AI compute more accessible and affordable. When real-time AI is within reach, anything is possible. Build fast.
Responsibilities
- Architect the development of Groq’s hyperscale compute platform, ensuring scalability, reliability, and security.
- Define and execute technical roadmaps that advance Groq’s capability to manage large-scale general and specialized compute infrastructure efficiently.
- Lead highly technical engineering teams focused on container orchestration, hardware provisioning, and platform automation.
- Build and grow the organization: attract, hire, mentor, and retain top-tier engineers; shape a culture of automation, simplicity, rapid learning and operational excellence.
- Own production Kubernetes clusters and Storage solutions distributed across several geographic regions, driving SLOs, incident response, and continual improvement.
- Enforce robust CI/CD—with container image scanning, automated integration tests, and progressive roll-outs—to keep the platform secure and rapidly evolving.
- Collaborate globally with data-center, hardware, and hardware teams to ensure seamless capacity expansions, hardware refreshes, and energy-efficiency initiatives.
- Partner closely with networking to champion modern data-plane technologies (Cilium/eBPF, BGP-based service routing, advanced load balancing) for low-latency throughput and high security.
Requirements
- 10+ years in large-scale infrastructure engineering, including 3+ years leading teams that run business-critical, globally distributed fleets.
- Proven leadership experience in highly technical engineering environments.
- Strong communication, planning, negotiation, and interpersonal skills.
- Cloud & hybrid experience, ideally with GCP.
- Hands-on experience building or operating clusters, writing Golang operators, CRDs, and CLI tools.
- Experience with on-prem storage technologies and Kubernetes integrations.
- CI/CD leadership experience with pipelines (GitHub Actions, Buildkite, or similar) at hyperscale velocity.
Attributes
- Humility, collaboration, growth mindset, curiosity, innovation, passion, grit, and boldness.
Compensation
Base salary range is $300,000 to $375,000, plus equity and benefits.
Location
Palo Alto, CA
Groq is an equal opportunity employer committed to diversity, inclusion, and belonging.