Senior Software Engineer - Grafana Databases, Managed Services
Used Tools & Technologies
PostgreSQLRequired Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Security @ 4
Grafana @ 4
Kafka @ 4
Kubernetes @ 4
Linux @ 4
Terraform @ 3
GCP @ 7
Distributed Systems @ 4
Leadership @ 4
AWS @ 7
Azure @ 7
Helm @ 3
Networking @ 4
SRE @ 7
Cassandra @ 4
Snowflake @ 4
Observability @ 4
AI @ 4
ClickHouse @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
Grafana Labs is a remote-first, open-source company with more than 20M users of Grafana. The Managed Services team within the Databases department owns and operates shared, production-critical infrastructure that powers Grafana Cloud’s next-generation database products (Mimir, Loki, and Tempo). This role is remote but the team is interested in applicants living in Spain time zones only.
Responsibilities
- Operate and evolve 100+ multi-cloud streaming clusters and related database infrastructure
- Diagnose and eliminate cross-layer failure modes (object storage latency, noisy neighbors, control-plane bottlenecks, query performance regressions)
- Design safe upgrade and rollout strategies at scale
- Improve observability, automation, and operational ergonomics
- Partner with database and platform teams to ensure safe scaling, partitioning, consumer fan-out, and query performance
- Work directly with distributed systems behavior, Kubernetes scheduling dynamics, storage engines, compression trade-offs, etc.
- Serve as a primary escalation point and participate in on-call for relevant incidents
- Own relationships with system vendors (including WarpStream Labs and others)
- Communicate and collaborate remotely with regular video calls; work autonomously
Requirements
- 6+ years of engineering experience, including meaningful time in SRE, platform engineering, production engineering, infrastructure engineering, or distributed systems roles
- Experience operating distributed systems in production (examples: Kafka, Redpanda, WarpStream, Postgres, ClickHouse, Snowflake, Cassandra)
- Strong Kubernetes experience in AWS, GCP, or Azure
- Familiarity with infrastructure-as-code tooling (Helm, Terraform, Jsonnet)
- Solid understanding of distributed systems design and large-scale system trade-offs
- Proficiency in at least one programming language (Go preferred, but not required)
- Working knowledge of Linux internals, networking, cloud storage, and performance/scaling behavior
- Experience with blameless incident response and writing high-quality post-incident reviews
- Clear communicator comfortable collaborating across teams
- Willingness to participate in on-call rotations and incident response
Compensation & Rewards
- Spain base compensation range: EUR 82,988 - EUR 99,586
- All roles include Restricted Stock Units (RSUs) and other benefits; additional benefits and bonus information available via Grafana Labs careers pages
Why You’ll Thrive at Grafana Labs
- 100% remote, global culture with emphasis on transparency, autonomy, and collaboration
- Opportunity to work on high-scale distributed systems and production-critical infrastructure
- Access to modern AI coding assistants with a company-funded usage budget (within security guidelines)
- Defined career growth pathways, approachable leadership, and a culture that values outcomes
- In-person onboarding to support early ramp and team integration
- Global annual leave policy (30 days per annum, with 3 reserved Grafana Shutdown Days; local legislation applies)
Equal Opportunity Employer
Grafana Labs recruits, trains, compensates, and promotes regardless of race, religion, color, national origin, gender, disability, age, veteran status, and other characteristics. The company may utilize AI tools in recruitment to assist in matching CVs to job postings. For information about personal data usage after applying, see Grafana Labs' privacy policy.