Staff Software Engineer - Grafana Cloud k6 | USA | Remote
at Grafana Labs
USD 175,000-210,000 per year
Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Docker @ 4
Go @ 7
Grafana @ 4
Kubernetes @ 4
DevOps @ 4
Python @ 7
Distributed Systems @ 4
Leadership @ 4
AWS @ 4
Communication @ 4
JavaScript @ 4
SRE @ 4
Prioritization @ 4
Reporting @ 4
Observability @ 4
Change Management @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
Grafana Labs is a remote-first, open-source powerhouse. The team behind Grafana k6, Grafana Cloud k6, and Grafana Cloud Synthetics builds and operates performance testing products used globally to run distributed tests from 15+ regions, ingest huge volumes of data, and enable analysis of metrics generated by k6. This role is with the Grafana Cloud k6 squad and focuses on establishing and scaling a cross-team culture of engineering excellence, driving DevOps/SRE practices, and growing into broader application and product development leadership as the reliability foundation matures.
Responsibilities
- Build and scale a strong culture of operational excellence by defining standards and coaching teams to own reliability and availability.
- Drive mature DevOps/SRE practices, including incident response and PIRs, on-call readiness, runbooks, alerting, observability, and release/change management.
- Establish reliability frameworks such as SLIs/SLOs and error budgets, and use them to guide prioritization and engineering trade-offs.
- Provide visibility into system health through clear operational metrics and reliability reporting.
- Guide teams in the design, development, evolution, and operation of large-scale, distributed cloud systems.
- Influence product and system direction through design reviews, architectural discussions, and cross-team collaboration.
- Share knowledge through clear, high-quality documentation and technical communication internally and, where appropriate, externally.
- As the reliability foundation matures, expand into broader application and product development leadership, contributing architectural and technical depth beyond operations.
Requirements
- Strong experience with DevOps/SRE practices, including operating and evolving production systems at scale.
- Strong programming background in a modern language (Python and Go are primary, but prior experience is not required).
- Experience designing, building, and operating large-scale distributed systems.
- Strong understanding of reliability engineering concepts (e.g. incident management, observability, and failure modes).
- Experience with test automation, including performance and functional testing.
- Ability to influence engineering practices through clear technical communication, reviews, and collaboration.
- Strong interpersonal skills and ability to work effectively across teams.
- Familiarity with modern software engineering processes and delivery practices.
- Self-driven and comfortable operating with a high degree of autonomy and ambiguity.
Bonus / Nice-to-have
- Experience with containerized and cloud-native systems (Docker, Kubernetes, AWS).
- Familiarity with observability tooling and platforms (e.g. the Grafana stack).
- Experience working with Python, Go, JavaScript and/or Jsonnet.
- Experience building or operating event-driven or asynchronous systems.
- Experience defining or applying SLIs/SLOs, error budgets, or reliability metrics.
- Interest in, or experience with, building testing frameworks or developer tooling.
Compensation & Rewards
- In the United States, the base compensation range for this role is $174,986 - $209,983. Actual compensation may vary based on level, experience, and skillset as assessed throughout the interview process.
- All roles include Restricted Stock Units (RSUs).
- Compensation ranges are country specific; recruiters will discuss market-specific pay ranges where applicable.
Why You’ll Thrive at Grafana Labs
- 100% Remote, Global Culture with applicants in United States time zones encouraged.
- Scaling organization with meaningful work in a high-growth environment.
- Transparent communication and innovation-driven culture.
- Open source roots and empowered teams.
- In-person onboarding to support day 1 ramp.
- Global annual leave policy of 30 days per annum (with 3 days reserved for Grafana Shutdown Days), subject to local legislation.
Equal Opportunity & Privacy
- Grafana Labs is an equal opportunity employer. Information about how personal data is used is available in the applicant privacy policy.