Senior Cloud Infrastructure Engineer

πŸ“ Switzerland
πŸ“ Germany
πŸ“ Spain
πŸ“ France
πŸ“ United Kingdom
πŸ“ Netherlands
πŸ“ Zurich, Switzerland
πŸ“ Munich, Germany
πŸ“ Berlin, Germany
πŸ“ Paris, France
πŸ“ London, United Kingdom
EUR 90,000-160,000 per year
SENIOR
βœ… Hybrid

Used Tools & Technologies

Not specified

Required Skills & Competences

Security @ 4 Docker @ 4 Kubernetes @ 4 Redis @ 4 IaC @ 4 Terraform @ 4 TypeScript @ 4 CI/CD @ 4 Datadog @ 4 Hiring @ 4 AWS @ 4 Helm @ 4 Networking @ 4 PostgreSQL @ 4 SRE @ 7 Next.js @ 3 CloudFormation @ 4 Experimentation @ 4 LLM @ 4 Compliance @ 4 Observability @ 4 AI @ 4 ClickHouse @ 4

Details

Langfuse is an open source LLM engineering platform focused on tracing, evaluation, and prompt management. The company is now part of ClickHouse and runs a cloud and self-hosted offering trusted by large customers. The team is engineering-heavy with offices in Berlin and San Francisco and is hiring for engineering in EU timezones. The role expects approximately one week per month in the Berlin office and is structured as a hybrid position.

Responsibilities

  • Own Langfuse Cloud operations: run production environments on AWS (ECS / Fargate) and ClickHouse Cloud, manage deployments, autoscaling, capacity planning, and cost optimization.
  • Build and maintain world-class observability: own Datadog setup end-to-end (dashboards, alerts, SLOs) and ensure early detection of degradations.
  • Make self-hosting effortless: own and evolve Helm charts, Docker Compose configuration, and deployment documentation for single-node to multi-region enterprise deployments.
  • Automate everything: implement and improve CI/CD pipelines, infrastructure-as-code, automated scaling, and zero-downtime deployments.
  • Scale for future product directions: design infrastructure to handle 10x growth and new features (long-running agents, real-time evaluation).
  • Harden security and compliance for cloud and self-hosted deployments as enterprise adoption grows.

Requirements

  • Strong infrastructure or SRE experience running systems at scale and improving reliability.
  • Experience operating production workloads on AWS (ECS/Fargate, networking, IAM, S3) or comparable hyperscale vendors.
  • Comfortable with container orchestration (Kubernetes and/or ECS), Helm charts, and Docker.
  • Experience with infrastructure-as-code (Terraform, Pulumi, CloudFormation, or similar).
  • Strong monitoring and observability instincts; experience building dashboards and alerts that catch real problems (Datadog experience is a plus).
  • Strong opinions and discipline around reliability, automation, and safe infrastructure change processes.
  • Interest in open source and willingness to help users debug self-hosted deployments.
  • Comfortable working in a small, accountable team where individual output is visible.
  • CS or quantitative degree preferred.

Bonus (nice-to-have)

  • Experience with ClickHouse Cloud or other managed analytical databases.
  • Background operating high-throughput event processing or observability infrastructure.
  • Contributions to open source infrastructure tooling (Helm charts, Terraform modules, etc.).
  • Former founder.

Tech Stack & Tools

  • Cloud / infra: AWS (ECS / Fargate), ClickHouse Cloud, S3, IAM, networking
  • Orchestration & packaging: Kubernetes, Helm charts, Docker, Docker Compose
  • IaC & automation: Terraform, Pulumi, CloudFormation, CI/CD pipelines
  • Observability: Datadog (dashboards, alerts, SLOs)
  • Data & storage: ClickHouse (tracing), PostgreSQL, Redis
  • Application stack mentioned: TypeScript monorepo (Next.js frontend, Express workers) β€” familiarity helpful when collaborating with product teams

Process & Culture

  • Fast hiring process (the company states the full process to offer can take less than 7 days).
  • Emphasis on ownership: engineers propose solutions (RFCs) and ship them; code reviews are used for mentorship.
  • Maker schedule with limited recurring meetings (weekly check-in and Friday demo). The team encourages experimentation with AI tooling and collaboration across the org.