Software Engineer, Observability

at OpenAI
USD 255,000-405,000 per year
MIDDLE
✅ On-site
✅ Relocation

Used Tools & Technologies

Not specified

Required Skills & Competences

Kubernetes @ 6 Prometheus @ 1 Distributed Systems @ 3 Hiring @ 3 AWS @ 6 Networking @ 6 Debugging @ 3 OpenTelemetry @ 1 Observability @ 3 AI @ 3

Details

Join the engineering teams that bring OpenAI’s ideas safely to the world!

The Applied Engineering team works across research, engineering, product, and design to bring OpenAI’s technology to consumers and businesses. The team focuses on learning from deployment and distributing the benefits of AI while ensuring responsible and safe use.

We’re building the observability product for OpenAI — from scalable infrastructure to a rich, AI-powered UI. Our systems ingest petabytes of logs and billions of time series metrics across our fleet. We're layering intelligence on top (agents that summarize SEVs, auto-generate dashboards, or help engineers debug through notebook-like UIs).

You’ll join a small team building foundational infrastructure and internal tools to make OpenAI's production systems reliable, performant, and observable.

Responsibilities

  • Own core observability infrastructure, including distributed logging, time series, and trace storage.
  • Build AI-native tools that help engineers detect, understand, and resolve issues autonomously.
  • Contribute to UI experiences like dashboards, notebooking, or interactive debugging.
  • Collaborate closely with engineers, researchers, user ops, and other teams across the company to build the next-generation observability product.

Requirements

  • Experience operating large-scale distributed systems in production (especially logging systems or time series databases).
  • Comfortable working in ambiguous environments and solving open-ended problems.
  • Full-stack chops or strong product sensibilities; excited to build real tools people use.
  • Strong fundamentals in systems, networking, and cloud infrastructure (mentions Kubernetes, AWS).
  • Bonus: built or contributed to observability systems (e.g., Prometheus, OpenTelemetry).

Why This Team

  • A combined infra and product team building an AI application for internal use.
  • Work will directly power the reliability of GPT-based products at massive scale.
  • Help define what “AI-powered observability” looks like at one of the world’s most advanced AI labs.

Benefits

  • Base pay range listed and offers equity; total compensation may include performance-related bonus(es).
  • Medical, dental, and vision insurance with employer contributions to Health Savings Accounts.
  • Pre-tax Health FSA, Dependent Care FSA, and commuter expense accounts.
  • 401(k) retirement plan with employer match.
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents); paid medical and caregiver leave (up to 8 weeks).
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees.
  • 13+ paid company holidays and multiple coordinated office closures; paid sick/safe time as required by law.
  • Mental health and wellness support; employer-paid basic life and disability coverage.
  • Annual learning and development stipend.
  • Daily meals in offices and meal delivery credits as eligible.
  • Relocation support for eligible employees.
  • Additional taxable fringe benefits (charitable donation matching, wellness stipends).

About OpenAI

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. The company emphasizes safety and diverse perspectives in building and deploying AI systems. Background checks and reasonable accommodations for applicants with disabilities are part of the hiring process.