Senior Software Engineer - Grafana Databases, Managed Services
at Grafana Labs
📍 Ireland
EUR 104,000-124,800 per year
Used Tools & Technologies
PostgreSQLRequired Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Grafana @ 4
Kafka @ 4
Kubernetes @ 4
Linux @ 4
Terraform @ 3
GCP @ 7
Distributed Systems @ 4
AWS @ 7
Azure @ 7
Helm @ 3
Networking @ 4
SRE @ 7
Cassandra @ 4
Snowflake @ 4
Observability @ 4
AI @ 4
ClickHouse @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
Grafana Labs is a remote-first, open-source company with a global user base for Grafana and related observability projects (Mimir, Loki, Tempo). The Managed Services squad within the Databases department operates shared, production-critical infrastructure that powers Grafana Cloud’s database products. This role is remote, and applicants should be located in Ireland time zones.
Responsibilities
- Operate and evolve 100+ multi-cloud streaming clusters and related database infrastructure
- Diagnose and eliminate cross-layer failure modes (object storage latency, noisy neighbors, control-plane bottlenecks, query performance regressions, etc.)
- Design safe upgrade and rollout strategies at scale
- Improve observability, automation, and operational ergonomics
- Partner with database and platform teams to ensure safe scaling, partitioning, consumer fan-out, and query performance
- Work directly with distributed systems behavior, Kubernetes scheduling dynamics, storage engines, compression trade-offs, and related areas
- Serve as a primary escalation point and participate in on-call rotations for incidents
- Own relationships with system vendors (including WarpStream Labs and others)
- Contribute to reliability, scaling, and operational excellence; participate in post-incident reviews and follow-ups
Requirements
- 6+ years of engineering experience, including SRE, platform engineering, production/infrastructure engineering, or distributed systems roles
- Experience operating distributed systems in production (examples listed: Kafka, Redpanda, WarpStream, Postgres, ClickHouse, Snowflake, Cassandra)
- Strong Kubernetes experience in AWS, GCP, or Azure
- Familiarity with infrastructure-as-code tooling such as Helm, Terraform, Jsonnet
- Solid understanding of distributed systems design and large-scale system trade-offs
- Proficiency in at least one programming language (Go preferred)
- Working knowledge of Linux internals, networking, cloud storage, and performance/scaling behavior
- Experience in blameless incident response and writing high-quality post-incident reviews
- Clear communicator, able to collaborate across teams and work autonomously
Compensation & Rewards
- Base compensation range in Ireland: EUR 104,000 - EUR 124,800 (actual compensation may vary based on level and experience)
- Role includes Restricted Stock Units (RSUs), and other benefits (equity, possible bonus). Additional benefits and details available via company careers pages.
Additional Details
- Remote-first role; applicants must be located in Ireland time zones
- On-call responsibilities shared across regions to provide balanced coverage
- Company encourages AI-assisted development tools (company-funded usage budget available)
- In-person onboarding is provided