Used Tools & Technologies
Not specified
Required Skills & Competences ?
Security @ 4 Go @ 4 Grafana @ 4 Kubernetes @ 4 Prometheus @ 4 VictoriaMetrics @ 4 Terraform @ 4 GCP @ 6 CI/CD @ 4 ArgoCD @ 4 Hiring @ 4 Networking @ 4 Rust @ 4 Debugging @ 4 Swift @ 4 Compliance @ 4Details
Groq delivers fast, efficient AI inference. Our LPU-based system powers GroqCloud™, giving businesses and developers the speed and scale they need. From our Bay Area roots to our growing global presence, we are on a mission to make high performance AI compute more accessible and affordable. When real-time AI is within reach, anything is possible. Build fast.
This role's mission is to design, build, and operate large-scale cloud systems to deliver the fastest inference engine in the world.
Responsibilities
- Infrastructure Development: Design, build, and automate cloud infrastructure using Terraform to support a wide variety of needs.
- Service Deployment & Orchestration: Build and manage robust deployment pipelines and GitOps workflows into Kubernetes-based environments. Continuously improve CI/CD processes to facilitate rapid, reliable rollouts of new features and services, ensuring minimal downtime and maximum velocity.
- System troubleshooting: Lead investigations to determine root causes of system failures and develop scripts to repair and automate the upkeep of infrastructure components.
- Observability enhancement: Implement comprehensive monitoring (tracing, metrics, logging, alerting) to swiftly pinpoint, diagnose, and resolve system issues.
- Efficient incident response: Manage critical system incidents as a first responder, ensuring swift resolution and comprehensive post-incident analyses with implemented remediations.
- Cross Functional Collaboration: Collaborate with software engineers, platform & networking engineers, product managers and sales to enable feature delivery.
Requirements
- 10+ years of experience in software engineering or a related field.
- 5+ years experience with GCP (especially VPC, Hybrid Networking, IAM, and GKE).
- Actively working with modern Infrastructure-as-Code technologies (Kubernetes, Terraform, Flux/ArgoCD, Kustomize, Crossplane).
- Experience with open-source monitoring tools (Prometheus, Grafana, VictoriaMetrics, VictoriaLogging and Alert Manager).
- Deep experience in cloud technologies, global scale applications, and automation.
- Familiarity with multi-region deployments, including the associated networking, latency, and failover challenges.
- History of debugging production issues, mitigating, and driving efficient resolution.
- Comfortable reading, writing, and debugging software in multiple languages, especially Go and Rust.
- Thorough understanding of cloud-security best practices and modern compliance controls.
Compensation & Benefits
- Base salary range (United States): $282,100 to $331,900 (determined by location, skills, qualifications, experience and internal benchmarks).
- Compensation package includes equity and benefits; local market compensation applies for candidates outside the USA.
Equal Opportunity & Other Notes
- Groq is an Equal Opportunity Employer and is committed to creating an inclusive environment for all employees and applicants.
- Reasonable accommodations are available; contact [email protected] for accommodation requests.
- All offers contingent upon verification of identity and employment authorization in accordance with federal law.
- Groq encourages people with criminal record histories to apply and references several local fair chance hiring laws for applicable jurisdictions.