Staff Software Engineer - Grafana Databases, Managed Services

📍 Ireland
EUR 104,000-124,800 per year
SENIOR
✅ Remote

Used Tools & Technologies

PostgreSQL

Required Skills & Competences

Grafana @ 4 Kafka @ 4 Kubernetes @ 4 Linux @ 4 Terraform @ 3 GCP @ 7 Distributed Systems @ 4 AWS @ 7 Azure @ 7 Helm @ 3 Networking @ 4 SRE @ 7 Cassandra @ 4 Snowflake @ 4 Observability @ 4 AI @ 4 ClickHouse @ 4

Details

Grafana Labs is a remote-first, open-source company with more than 20M users of Grafana and customers including Bloomberg, JPMorgan Chase, and eBay. The Managed Services squad within the Databases department operates shared, production-critical infrastructure that powers Grafana Cloud’s next-generation database products (Mimir, Loki, and Tempo). This includes operating 100+ WarpStream clusters across multiple cloud providers and regions, and working closely with high-volume analytical and storage systems where latency, compression behavior, storage layout, and scaling characteristics matter.

Responsibilities

  • Operate and evolve 100+ multi-cloud streaming clusters and related database infrastructure
  • Diagnose and eliminate cross-layer failure modes (e.g., object storage latency, noisy neighbors, control-plane bottlenecks, query performance regressions)
  • Design safe upgrade and rollout strategies at scale
  • Improve observability, automation, and operational ergonomics
  • Partner with database and platform teams to ensure safe scaling, partitioning, consumer fan-out, and query performance
  • Work directly with distributed systems behavior, Kubernetes scheduling dynamics, storage engines, compression trade-offs, etc.
  • Serve as a primary escalation point and on-call for relevant incidents
  • Own relationships with system vendors (including WarpStream Labs and others)
  • At Staff level: define/evolve technical direction for operating WarpStream and adjacent systems, lead complex initiatives (migrations, rollouts, reliability investments), establish best practices around SLOs, scaling limits, failure isolation and change safety, investigate and resolve multi-layer incidents, identify systemic risks across clusters, drive automation to reduce toil, mentor engineers, and influence architecture and long-term scalability strategy.

Requirements

  • 8+ years of engineering experience, including SRE, platform, production, infrastructure, or distributed systems roles
  • Experience with high-throughput streaming systems, analytical or storage backends, or large-scale database infrastructure (examples cited: Kafka, Redpanda, WarpStream, Postgres, ClickHouse, Snowflake, Cassandra)
  • Strong Kubernetes experience in AWS, GCP, or Azure
  • Familiarity with infrastructure-as-code tooling (Helm, Terraform, Jsonnet)
  • Experience leading or driving complex technical efforts (may be without formal management responsibilities)
  • Strong understanding of distributed systems failure modes in multi-cloud environments
  • Proficiency in at least one systems-oriented language (Go preferred)
  • Working knowledge of Linux internals, networking, cloud storage, and performance/scaling behavior
  • Experience participating in blameless incident response and writing high-quality post-incident reviews
  • Clear communicator who can collaborate across teams and work autonomously

Compensation & Rewards

  • Base compensation range (Ireland): EUR 104,000 - EUR 124,800 (actual compensation may vary based on level, experience, and skillset)
  • Roles include Restricted Stock Units (RSUs); benefits, bonus (if applicable), and other benefits are listed on the company site

Other Details

  • This is a remote opportunity; applicants should be located in Ireland time zones only at this time
  • The role includes an on-call component; the organization hires globally to keep on-call coverage balanced (~12 daylight hours per day)
  • The company supports pragmatic AI-assisted development (company-funded usage budget for approved AI coding assistants)
  • Equal opportunity employer policies and privacy policy links are provided in the posting