Used Tools & Technologies
Not specified
Required Skills & Competences ?
Go @ 3 Grafana @ 4 Kubernetes @ 4 Terraform @ 4 Python @ 3 GCP @ 4 Distributed Systems @ 4 AWS @ 4 Azure @ 4 Communication @ 4 Networking @ 4 AWS EKS @ 4Details
Grafana Labs is a remote-first, open-source company building the Grafana observability stack used by millions. The Platform InfraCore squad provides and owns automation for provisioning cloud provider resources (networking, Kubernetes clusters, and CSP-specific resources) used by application teams. This role focuses on performance, reliability, and efficient operation of infrastructure that powers Grafana Cloud.
Working model
- Remote position; applicants must be located in USA time zones (United States only).
- The team is remote-first and collaborative. Participation in on-call rotations is required because the squad deploys production services.
Responsibilities
- Automate provisioning and lifecycle management of Kubernetes clusters and CSP resources.
- Manage cluster networking components: load balancing, NAT, DNS, CNIs, network policies, private connectivity for customers, and cross-cluster communication.
- Manage scheduling and autoscaling for workloads.
- Maintain Crossplane compositions and Terraform modules for common CSP resources, including versioning and compatibility of Crossplane, Terraform core, and providers.
- Work closely with Grafana Cloud application teams to understand requirements and invest in the right platform capabilities.
- Participate in the Platform department Infrastructure wing on-call rotation and operate production services you build.
Requirements
- Experience with Kubernetes cluster provisioning and lifecycle management.
- Experience managing cluster networking (load balancing, NAT, DNS, CNI, network policies, private connectivity, cross-cluster communication).
- Experience with scheduling and autoscaling in Kubernetes environments.
- Experience maintaining Crossplane compositions and Terraform modules, and managing versioning/compatibility for Crossplane and Terraform (core and providers).
- Comfortable working with distributed systems and operating your own code in production (on-call responsibilities).
- Familiarity with Go, Python, and Shell (the Platform team mainly works with these languages).
Bonus Points
- Experience contributing to open source or community-based projects.
- Experience with multiple cloud providers and managed Kubernetes services (AWS EKS, GCP GKE, Azure AKS).
- Experience operating and managing workloads on Kubernetes and configuration management with Tanka/Jsonnet.
- Familiarity with Kubernetes scheduling projects such as Karpenter.
- Strong Terraform and/or Crossplane experience.
- Interest or experience building tools in Go and thinking about operational maturity.
A few upcoming projects
- Improving Kubernetes efficiency (processor architecture, CSP capacity, machine selection, multi-AZ utilization).
- Kubernetes cluster lifecycle automation across many CSP regions.
- Improving Terraform ease of use alongside Crossplane delivery strategies.
Compensation & Benefits
- United States base compensation range: USD 154,445 - USD 185,334 (actual compensation depends on level, experience, and skills).
- Roles include Restricted Stock Units (RSUs). Additional benefits, equity, and potential bonus apply as listed by the company.
- 30 days annual leave (global policy) with 3 reserved Grafana Shutdown Days. In-person onboarding is provided.
Equal opportunity & recruiting notes
- Grafana Labs is an equal opportunity employer.
- The company may utilize AI tools in recruitment to assist matching CVs to job postings; recruitment team reviews CVs manually as well.