Senior Software Engineer - Grafana Databases, Managed Services

GBP 91,800-110,100 per year
SENIOR
✅ Remote

Used Tools & Technologies

PostgreSQL

Required Skills & Competences

Security @ 4 Grafana @ 4 Kafka @ 4 Kubernetes @ 4 Linux @ 4 Terraform @ 3 GCP @ 3 Distributed Systems @ 4 AWS @ 3 Azure @ 3 Communication @ 4 Helm @ 3 Networking @ 4 SRE @ 7 Cassandra @ 4 Snowflake @ 4 Observability @ 4 AI @ 4 ClickHouse @ 4

Details

Grafana Labs is a remote-first, open-source company with more than 20M users of Grafana and a managed product offering (Grafana Cloud) used by thousands of companies. The Managed Services team within the Databases department owns and operates shared, production-critical infrastructure that powers Grafana Cloud’s database products (Mimir, Loki, Tempo). This includes operating 100+ WarpStream clusters across multiple cloud providers and regions and working with high-volume analytical and storage systems where latency, compression, storage layout, and scaling characteristics matter deeply.

Role overview

This is a Senior Engineer role on the Managed Services team focused on operating and evolving multi-cloud streaming clusters and related database infrastructure. The role blends deep distributed systems work with reliability, scaling, and operational excellence, and includes on-call responsibilities.

Responsibilities

  • Operate and evolve 100+ multi-cloud streaming clusters and related database infrastructure.
  • Diagnose and eliminate cross-layer failure modes (object storage latency, noisy neighbors, control-plane bottlenecks, query performance regressions, etc.).
  • Design safe upgrade and rollout strategies at scale.
  • Improve observability, automation, and operational ergonomics.
  • Partner closely with database and platform teams to ensure safe scaling, partitioning, consumer fan-out, and query performance.
  • Work directly with distributed systems behavior, Kubernetes scheduling dynamics, storage engines, and compression trade-offs.
  • Serve as a primary escalation point and participate in on-call rotations and incident response.
  • Own vendor relationships (e.g., WarpStream Labs and others).
  • Participate in PR review, design documents, automation, tooling, and post-incident reviews.

Requirements

  • 6+ years of engineering experience, including meaningful time in SRE, platform engineering, production/infrastructure engineering, or distributed systems roles.
  • Experience operating distributed systems in production (examples: Kafka, Redpanda, WarpStream, Postgres, ClickHouse, Snowflake, Cassandra).
  • Strong Kubernetes experience in AWS, GCP, or Azure, and familiarity with infrastructure-as-code tooling (Helm, Terraform, Jsonnet, etc.).
  • Solid understanding of distributed systems design and large-scale system trade-offs.
  • Proficiency in at least one programming language (Go preferred).
  • Working knowledge of Linux internals, networking, cloud storage, and performance/scaling behavior.
  • Experience participating in blameless incident response and writing high-quality post-incident reviews.
  • Clear communication skills and ability to work autonomously in a remote environment.

Compensation & Rewards

  • Base compensation range (United Kingdom): GBP 91,755 - GBP 110,106.
  • Roles include Restricted Stock Units (RSUs). Other benefits, equity, and bonus information are available via the company careers page.

Location & work arrangement

  • This is a remote opportunity targeted at applicants living in UK time zones only. Grafana Labs is remote-first and uses video conferencing for collaboration.
  • The role includes an on-call component and coordinated coverage across regions (~12 daylight hours per day coverage target).
  • In-person onboarding is provided.

Other notes

  • Grafana Labs permits use of modern AI coding assistants within security guidelines and provides a company-funded usage budget. Frontier models (examples given) are available as optional tools.
  • Grafana Labs is an equal opportunity employer and provides a global remote culture with defined career growth pathways and company-wide transparency.