Senior Cloud Infrastructure Engineer
at Langfuse
π Switzerland
π Germany
π Spain
π France
π United Kingdom
π Netherlands
π Zurich, Switzerland
π Munich, Germany
π Berlin, Germany
π Paris, France
π London, United Kingdom
π Germany
π Spain
π France
π United Kingdom
π Netherlands
π Zurich, Switzerland
π Munich, Germany
π Berlin, Germany
π Paris, France
π London, United Kingdom
EUR 90,000-160,000 per year
Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 β basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 β daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 β you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 β exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Security @ 4
Docker @ 4
Kubernetes @ 4
Redis @ 4
IaC @ 4
Terraform @ 4
TypeScript @ 4
CI/CD @ 4
Datadog @ 4
Hiring @ 4
AWS @ 4
Helm @ 4
Networking @ 4
PostgreSQL @ 4
SRE @ 7
Next.js @ 3
CloudFormation @ 4
Experimentation @ 4
LLM @ 4
Compliance @ 4
Observability @ 4
AI @ 4
ClickHouse @ 4
- 1-2 β basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 β daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 β you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 β exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
Langfuse is an open source LLM engineering platform focused on tracing, evaluation, and prompt management. The company is now part of ClickHouse and runs a cloud and self-hosted offering trusted by large customers. The team is engineering-heavy with offices in Berlin and San Francisco and is hiring for engineering in EU timezones. The role expects approximately one week per month in the Berlin office and is structured as a hybrid position.
Responsibilities
- Own Langfuse Cloud operations: run production environments on AWS (ECS / Fargate) and ClickHouse Cloud, manage deployments, autoscaling, capacity planning, and cost optimization.
- Build and maintain world-class observability: own Datadog setup end-to-end (dashboards, alerts, SLOs) and ensure early detection of degradations.
- Make self-hosting effortless: own and evolve Helm charts, Docker Compose configuration, and deployment documentation for single-node to multi-region enterprise deployments.
- Automate everything: implement and improve CI/CD pipelines, infrastructure-as-code, automated scaling, and zero-downtime deployments.
- Scale for future product directions: design infrastructure to handle 10x growth and new features (long-running agents, real-time evaluation).
- Harden security and compliance for cloud and self-hosted deployments as enterprise adoption grows.
Requirements
- Strong infrastructure or SRE experience running systems at scale and improving reliability.
- Experience operating production workloads on AWS (ECS/Fargate, networking, IAM, S3) or comparable hyperscale vendors.
- Comfortable with container orchestration (Kubernetes and/or ECS), Helm charts, and Docker.
- Experience with infrastructure-as-code (Terraform, Pulumi, CloudFormation, or similar).
- Strong monitoring and observability instincts; experience building dashboards and alerts that catch real problems (Datadog experience is a plus).
- Strong opinions and discipline around reliability, automation, and safe infrastructure change processes.
- Interest in open source and willingness to help users debug self-hosted deployments.
- Comfortable working in a small, accountable team where individual output is visible.
- CS or quantitative degree preferred.
Bonus (nice-to-have)
- Experience with ClickHouse Cloud or other managed analytical databases.
- Background operating high-throughput event processing or observability infrastructure.
- Contributions to open source infrastructure tooling (Helm charts, Terraform modules, etc.).
- Former founder.
Tech Stack & Tools
- Cloud / infra: AWS (ECS / Fargate), ClickHouse Cloud, S3, IAM, networking
- Orchestration & packaging: Kubernetes, Helm charts, Docker, Docker Compose
- IaC & automation: Terraform, Pulumi, CloudFormation, CI/CD pipelines
- Observability: Datadog (dashboards, alerts, SLOs)
- Data & storage: ClickHouse (tracing), PostgreSQL, Redis
- Application stack mentioned: TypeScript monorepo (Next.js frontend, Express workers) β familiarity helpful when collaborating with product teams
Process & Culture
- Fast hiring process (the company states the full process to offer can take less than 7 days).
- Emphasis on ownership: engineers propose solutions (RFCs) and ship them; code reviews are used for mentorship.
- Maker schedule with limited recurring meetings (weekly check-in and Friday demo). The team encourages experimentation with AI tooling and collaboration across the org.