Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Security @ 4
Grafana @ 4
Kubernetes @ 4
IaC @ 4
Terraform @ 4
Python @ 4
Distributed Systems @ 4
Leadership @ 4
Communication @ 7
Rust @ 4
Microservices @ 4
Technical Leadership @ 4
Compliance @ 4
Codex @ 4
Observability @ 4
AI @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
Grafana Labs is the company behind the open observability cloud. Grafana Cloud is a fully managed observability platform built for scale and used by millions. The company is 100% remote with a global team and emphasizes open source, open standards, and a collaborative culture.
This role is a remote opportunity for applicants located in Canadian time zones (EST + CST highly preferred).
Responsibilities
- Work on the Internal Engineering Platform (IEP) delivered by the Platform department, providing application engineers with tools, systems and Kubernetes clusters to build, deploy and run workloads.
- Join the Platform SysEng squad focused on platform maturity and scalability, reducing new region build timelines and improving performance, reliability, and efficiency.
- Take projects from conception to production; own the full lifecycle of code including design docs, developer feedback, and integration testing.
- Participate in on-call rotations to ensure health of production services and to better understand system usage and operational requirements.
- Collaborate across engineering teams to manage cloud infrastructure, capacity management, security, engineering productivity, monitoring and sustainability, and compliance where applicable.
Requirements
- Proven delivery of large distributed systems; experience shipping and operating complex systems that span multiple teams with clear technical leadership and impact.
- Demonstrable experience in system design with a deep understanding of tradeoffs around latency, consistency, availability, scaling and cost.
- Hands-on cloud and platform experience with cloud-native architectures (microservices, containers/Kubernetes, IaC) and the operational practices that keep them healthy.
- Reliability and performance ownership: comfortable defining SLOs/SLIs, capacity planning, tuning performance, and driving reliability work end-to-end.
- Excellent coding and design skills; write clear, maintainable, well-tested code. Primary language is Go, but experience in Python/C/C++/Rust or similar is transferable.
- Comfort with AI-assisted development and practical experience folding AI-powered developer tools into a team’s workflow.
- Influence without authority: ability to align cross-functional stakeholders, set priorities and drive outcomes in a remote-first environment.
- Strong written and verbal communication skills suitable for both technical and non-technical stakeholders.
Bonus Points For
- Experience with open source or community-based projects.
- Familiarity with Kubernetes scheduling and projects like Karpenter.
- Experience with Terraform and/or Crossplane (Grafana Labs uses mixed approaches).
- Experience with Tanka and/or Jsonnet.
Compensation & Rewards
- In Canada, the base compensation range for this role is CAD 186,368 - CAD 223,642. Actual compensation may vary based on level, experience, and skillset.
- All roles include Restricted Stock Units (RSUs). Benefits include equity, potential bonus (if applicable), and other benefits listed on the company careers page.
Why You’ll Thrive at Grafana Labs
- 100% remote, global culture; scaling organization with meaningful work and transparent decision-making.
- Autonomous, innovation-driven teams with open-source roots and a high-trust, low-ego culture.
- Career growth pathways, approachable leadership, and a global annual leave policy (30 days per annum; 3 days reserved for Grafana Shutdown Days).
- In-person onboarding to help new hires integrate.
- Company-funded usage budget for AI-assisted development and access to frontier models (examples listed include GPT-Codex 5/3, Claude Opus 4.6, Gemini 3 Pro).
Other
- On-call rotations are part of the role to maintain production service health.
- Grafana Labs is an equal opportunity employer and will comply with local legislation where applicable.
- Applicants are encouraged to apply even if they do not meet every requirement.