Senior Software Engineer - Platform Infracore

📍 Canada
CAD 164,500-197,400 per year
SENIOR
✅ Remote

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Go @ 3 Grafana @ 4 Kubernetes @ 4 Terraform @ 4 Python @ 3 GCP @ 4 Distributed Systems @ 4 AWS @ 4 Azure @ 4 Communication @ 4 Networking @ 4

Details

Grafana Labs is a remote-first, open-source company powering observability for millions of users. The Platform group builds and operates the Internal Engineering Platform (IEP) that provides application engineers with tooling, systems, and Kubernetes clusters to build, deploy, and run workloads. The InfraCore squad focuses on automation for provisioning CSP resources, networking, Kubernetes clusters, and other cloud resources required by application teams. This is a remote role open to applicants in Canadian time zones only.

Responsibilities

  • Design, implement, and operate automation for provisioning cloud service provider (CSP) resources, including networking, Kubernetes clusters, and other CSP-specific resources.
  • Own and improve cluster lifecycle automation, capacity management, and scheduling/autoscaling for managed Kubernetes clusters.
  • Manage cluster networking components: load balancing, NAT, DNS, CNIs, network policies, private connectivity for customers, and cross-cluster communication.
  • Maintain Crossplane compositions and Terraform modules for common CSP resources; manage versioning and compatibility for Crossplane, Terraform core, and providers.
  • Collaborate with internal application teams to understand needs and ensure the platform invests in the right capabilities.
  • Participate in the Platform department Infrastructure on-call rotation and operate production services.

Requirements

  • Strong experience with Kubernetes cluster provisioning and lifecycle management.
  • Experience managing cluster networking (load balancers, NAT, DNS, CNIs, network policies, private connectivity) and cross-cluster communication.
  • Knowledge of scheduling and autoscaling for Kubernetes workloads.
  • Experience maintaining Crossplane compositions and Terraform modules; understanding of versioning and provider compatibility.
  • Comfortable working with distributed systems and operating your own code in production (on-call experience expected).
  • Collaboration skills to work with engineers and management in a remote-first environment.
  • Familiarity with programming and scripting; the Platform team primarily uses Go, Python, and Shell.

Bonus / Nice to Have

  • Experience with open source or community-driven projects.
  • Experience across multiple CSPs (AWS, GCP, Azure) and their managed Kubernetes services (EKS, GKE, AKS).
  • Experience operating workloads on Kubernetes and using Tanka/Jsonnet for configuration management.
  • Familiarity with Kubernetes scheduling projects like Karpenter.
  • Strong Terraform and/or Crossplane experience.
  • Comfortable programming in Go and building internal tools, exporters, or utilities.
  • Interest in operational maturity models for platform engineering.

A few upcoming projects

  • Improving Kubernetes efficiency (processor architecture, CSP capacity, machine selection, multi-AZ utilization).
  • K8s cluster lifecycle automation across many CSP regions.
  • Improving Terraform ease-of-use and Crossplane delivery strategy.

Compensation & Benefits

  • Base compensation range in Canada: CAD 164,490 - CAD 197,389 (actual compensation may vary by level, experience, and skills).
  • All roles include Restricted Stock Units (RSUs). Benefits include equity, potential bonus, and other company benefits.
  • 100% remote company culture, in-person onboarding, global annual leave policy (30 days per annum; 3 days reserved for Grafana Shutdown Days), and additional employee benefits.

Additional notes

  • This role is remote but restricted to applicants in Canadian time zones. Candidates may be expected to participate in on-call rotations and in-person onboarding events.
  • Grafana Labs values diversity, open-source contributions, and a collaborative remote-first culture.