Staff Software Engineer - Grafana Cloud k6 | USA | Remote

at Grafana Labs

📍 United States

USD 175,000-210,000 per year

SENIOR

✅ Remote

Used Tools & Technologies

Not specified

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Docker @ 4 Go @ 7 Grafana @ 4 Kubernetes @ 4 DevOps @ 4 Python @ 7 Distributed Systems @ 4 Leadership @ 4 AWS @ 4 Communication @ 4 JavaScript @ 4 SRE @ 4 Prioritization @ 4 Reporting @ 4 Observability @ 4 Change Management @ 4

Details

Grafana Labs is a remote-first, open-source powerhouse. The team behind Grafana k6, Grafana Cloud k6, and Grafana Cloud Synthetics builds and operates performance testing products used globally to run distributed tests from 15+ regions, ingest huge volumes of data, and enable analysis of metrics generated by k6. This role is with the Grafana Cloud k6 squad and focuses on establishing and scaling a cross-team culture of engineering excellence, driving DevOps/SRE practices, and growing into broader application and product development leadership as the reliability foundation matures.

Responsibilities

Build and scale a strong culture of operational excellence by defining standards and coaching teams to own reliability and availability.
Drive mature DevOps/SRE practices, including incident response and PIRs, on-call readiness, runbooks, alerting, observability, and release/change management.
Establish reliability frameworks such as SLIs/SLOs and error budgets, and use them to guide prioritization and engineering trade-offs.
Provide visibility into system health through clear operational metrics and reliability reporting.
Guide teams in the design, development, evolution, and operation of large-scale, distributed cloud systems.
Influence product and system direction through design reviews, architectural discussions, and cross-team collaboration.
Share knowledge through clear, high-quality documentation and technical communication internally and, where appropriate, externally.
As the reliability foundation matures, expand into broader application and product development leadership, contributing architectural and technical depth beyond operations.

Requirements

Strong experience with DevOps/SRE practices, including operating and evolving production systems at scale.
Strong programming background in a modern language (Python and Go are primary, but prior experience is not required).
Experience designing, building, and operating large-scale distributed systems.
Strong understanding of reliability engineering concepts (e.g. incident management, observability, and failure modes).
Experience with test automation, including performance and functional testing.
Ability to influence engineering practices through clear technical communication, reviews, and collaboration.
Strong interpersonal skills and ability to work effectively across teams.
Familiarity with modern software engineering processes and delivery practices.
Self-driven and comfortable operating with a high degree of autonomy and ambiguity.

Bonus / Nice-to-have

Experience with containerized and cloud-native systems (Docker, Kubernetes, AWS).
Familiarity with observability tooling and platforms (e.g. the Grafana stack).
Experience working with Python, Go, JavaScript and/or Jsonnet.
Experience building or operating event-driven or asynchronous systems.
Experience defining or applying SLIs/SLOs, error budgets, or reliability metrics.
Interest in, or experience with, building testing frameworks or developer tooling.

Compensation & Rewards

In the United States, the base compensation range for this role is $174,986 - $209,983. Actual compensation may vary based on level, experience, and skillset as assessed throughout the interview process.
All roles include Restricted Stock Units (RSUs).
Compensation ranges are country specific; recruiters will discuss market-specific pay ranges where applicable.

Why You’ll Thrive at Grafana Labs

100% Remote, Global Culture with applicants in United States time zones encouraged.
Scaling organization with meaningful work in a high-growth environment.
Transparent communication and innovation-driven culture.
Open source roots and empowered teams.
In-person onboarding to support day 1 ramp.
Global annual leave policy of 30 days per annum (with 3 days reserved for Grafana Shutdown Days), subject to local legislation.

Equal Opportunity & Privacy

Grafana Labs is an equal opportunity employer. Information about how personal data is used is available in the applicant privacy policy.