Used Tools & Technologies
Not specified
Required Skills & Competences ?
Grafana @ 4 Kubernetes @ 4 Prometheus @ 4 Terraform @ 7 TypeScript @ 4 Communication @ 4 IaaS @ 7 Rust @ 4 OpenTelemetry @ 4Details
Groq delivers fast, efficient AI inference. Our LPU-based system powers GroqCloud™, giving businesses and developers the speed and scale they need. Headquartered in Silicon Valley, we are on a mission to make high performance AI compute more accessible and affordable. When real-time AI is within reach, anything is possible. Build fast.
Responsibilities
- Build and maintain comprehensive observability systems at massive scale with excellent uptime.
- Iterate on, maintain, update, automate, and dogfood your own systems with great monitoring.
- Instrument Kubernetes clusters, applications, and datacenter infrastructure components such as switches, PDUs, environmental sensors, cameras, and chillers.
- Work with effective canonical logging and cost control.
- Provide tracing expertise including context propagation, tail sampling strategies, attribute enrichment, querying.
- Collect metrics from various systems such as hosts, kube-state-metrics, kubelet, IPMI, SNMP.
- Advise teams on instrumenting applications in Rust, C++, TypeScript, GoLang.
- Implement sensible SLO and alerting strategies and on-call best practices.
Requirements
- 4+ years experience in observability as a core role responsibility.
- Deep understanding of cloud-native technologies and infrastructure as a service (IaaS) like Terraform and Flux.
- Experience instrumenting large Kubernetes clusters and building operators.
- Expertise with monitoring, observability, and alerting systems such as OpenTelemetry Tracing and Collector, Grafana/Prometheus, PagerDuty, AlertManager, IPMI, SNMP.
- Strong analytical and problem-solving skills, focusing on root cause analysis and mitigation.
- Excellent communication and teamwork skills for collaboration across engineering teams.
Attributes
- Humility and egoless collaboration.
- Collaborative & team savvy.
- Growth & giver mindset.
- Curious & innovative.
- Passion, grit, & boldness.
Compensation
Competitive base salary range from $186,915 to $252,885, plus equity and benefits.
Location
Some roles may require proximity to primary sites.
Groq is an Equal Opportunity Employer committed to diversity, inclusion, and belonging.