Staff Software Engineer - Platform, SysEng

USD 175,000-210,000 per year
SENIOR
✅ Remote

Used Tools & Technologies

Not specified

Required Skills & Competences

Go @ 4 Grafana @ 4 Kubernetes @ 4 IaC @ 4 Terraform @ 4 Python @ 4 Distributed Systems @ 4 Leadership @ 4 Communication @ 7 Rust @ 4 Microservices @ 4 Technical Leadership @ 4 Observability @ 4 AI @ 4

Details

Grafana Labs is the company behind the open observability cloud and Grafana Cloud, a fully managed observability platform used by millions. We are a 100% remote company with a global engineering organization focused on building scalable, reliable systems. This role is within the Internal Engineering Platform (IEP) and the Platform SysEng squad, focused on platform maturity and scalability across engineering teams. This is a remote opportunity for candidates located in the United States (EST + CST highly preferred).

Responsibilities

  • Reduce new region build timelines and improve platform maturity and scalability.
  • Work across engineering teams to manage and evolve the platform that supports services like Grafana, Mimir, Loki, Tempo, and Pyroscope.
  • Own production services and participate in on-call rotations to ensure system health and reliability.
  • Take projects from conception to production: write design docs, implement changes, integrate testing, and incorporate developer feedback.
  • Define and drive reliability and performance work end-to-end, including SLOs/SLIs and capacity planning.
  • Collaborate with cross-functional stakeholders in a remote-first environment and influence outcomes without direct authority.

Requirements

  • Proven delivery of large distributed systems and evidence of technical leadership and impact.
  • Demonstrable experience in system design, with deep understanding of trade-offs around latency, consistency, availability, scaling, and cost.
  • Hands-on cloud and platform experience with cloud-native architectures (microservices, containers/Kubernetes, IaC) and operational practices to keep them healthy.
  • Reliability and performance ownership: defining SLOs/SLIs, capacity planning, tuning performance, and driving reliability projects.
  • Excellent coding and design skills; ability to write clear, maintainable, well-tested code. Primary language use at Grafana Labs is Go (experience in Python/C/C++/Rust or similar translates well).
  • Comfortable using AI-assisted development tools and integrating them into a team workflow; company provides access and budget for such tools.
  • Strong communication skills (written and verbal) that work across engineers and non-technical stakeholders.
  • Comfortable working in a remote-first company and collaborating across distributed teams.

Technologies & Tools Mentioned

  • Languages: Go, Python, Shell (also mentions C/C++/Rust as transferable).
  • Cloud-native: Kubernetes, containers, microservices.
  • Infrastructure as Code: Terraform and/or Crossplane.
  • Kubernetes scheduling tooling (example: Karpenter).
  • Templating/manifest tools: Tanka and Jsonnet.
  • Observability stacks: Grafana, Mimir, Loki, Tempo, Pyroscope.
  • Practices: SLOs/SLIs, capacity planning, performance tuning, incident/on-call operations.
  • AI-assisted development tools (examples of frontier models referenced for developer use).

Bonus Points

  • Experience in open source or community-based projects.
  • Familiarity with Kubernetes scheduling and projects like Karpenter.
  • Terraform and/or Crossplane experience.
  • Experience with Tanka and/or Jsonnet.

Compensation & Rewards

  • Base compensation range in the United States: USD 174,986 - USD 209,983 (actual compensation may vary by level, experience, and skillset).
  • Roles include Restricted Stock Units (RSUs), and may include bonus (if applicable) and other benefits.
  • Company provides a developer AI usage budget and access to frontier models for development workflows.

Other Details

  • 100% remote company culture; in-person onboarding is provided.
  • On-call rotations are part of operating production services.
  • Global annual leave policy: 30 days per annum (with 3 days reserved for Grafana Shutdown Days), subject to local legislation.
  • Grafana Labs is an equal opportunity employer and may use AI tools in the recruitment process.