Senior Staff Infrastructure Engineer, GroqCloud

at Groq
USD 282,100-331,900 per year
SENIOR
✅ Remote

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Security @ 4 Go @ 4 Grafana @ 4 Kubernetes @ 4 Prometheus @ 4 VictoriaMetrics @ 4 Terraform @ 4 GCP @ 6 CI/CD @ 4 ArgoCD @ 4 Hiring @ 4 Networking @ 4 Rust @ 4 Debugging @ 4 Swift @ 4 Compliance @ 4

Details

Groq delivers fast, efficient AI inference. Our LPU-based system powers GroqCloud™, giving businesses and developers the speed and scale they need. From our Bay Area roots to our growing global presence, we are on a mission to make high performance AI compute more accessible and affordable. When real-time AI is within reach, anything is possible. Build fast.

This role's mission is to design, build, and operate large-scale cloud systems to deliver the fastest inference engine in the world.

Responsibilities

  • Infrastructure Development: Design, build, and automate cloud infrastructure using Terraform to support a wide variety of needs.
  • Service Deployment & Orchestration: Build and manage robust deployment pipelines and GitOps workflows into Kubernetes-based environments. Continuously improve CI/CD processes to facilitate rapid, reliable rollouts of new features and services, ensuring minimal downtime and maximum velocity.
  • System troubleshooting: Lead investigations to determine root causes of system failures and develop scripts to repair and automate the upkeep of infrastructure components.
  • Observability enhancement: Implement comprehensive monitoring (tracing, metrics, logging, alerting) to swiftly pinpoint, diagnose, and resolve system issues.
  • Efficient incident response: Manage critical system incidents as a first responder, ensuring swift resolution and comprehensive post-incident analyses with implemented remediations.
  • Cross Functional Collaboration: Collaborate with software engineers, platform & networking engineers, product managers and sales to enable feature delivery.

Requirements

  • 10+ years of experience in software engineering or a related field.
  • 5+ years experience with GCP (especially VPC, Hybrid Networking, IAM, and GKE).
  • Actively working with modern Infrastructure-as-Code technologies (Kubernetes, Terraform, Flux/ArgoCD, Kustomize, Crossplane).
  • Experience with open-source monitoring tools (Prometheus, Grafana, VictoriaMetrics, VictoriaLogging and Alert Manager).
  • Deep experience in cloud technologies, global scale applications, and automation.
  • Familiarity with multi-region deployments, including the associated networking, latency, and failover challenges.
  • History of debugging production issues, mitigating, and driving efficient resolution.
  • Comfortable reading, writing, and debugging software in multiple languages, especially Go and Rust.
  • Thorough understanding of cloud-security best practices and modern compliance controls.

Compensation & Benefits

  • Base salary range (United States): $282,100 to $331,900 (determined by location, skills, qualifications, experience and internal benchmarks).
  • Compensation package includes equity and benefits; local market compensation applies for candidates outside the USA.

Equal Opportunity & Other Notes

  • Groq is an Equal Opportunity Employer and is committed to creating an inclusive environment for all employees and applicants.
  • Reasonable accommodations are available; contact [email protected] for accommodation requests.
  • All offers contingent upon verification of identity and employment authorization in accordance with federal law.
  • Groq encourages people with criminal record histories to apply and references several local fair chance hiring laws for applicable jurisdictions.