Software Engineer, Caching Infrastructure

at OpenAI

📍 San Francisco, United States

USD 255,000-405,000 per year

MIDDLE

✅ On-site

✅ Relocation

Used Tools & Technologies

Not specified

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Kubernetes @ 6 Memcached @ 6 Redis @ 6 Distributed Systems @ 6 Networking @ 6 API @ 3 ChatGPT @ 3 Compliance @ 3

Details

About the Team

At OpenAI, we’re building safe and beneficial artificial general intelligence. We deploy our models through ChatGPT, our APIs, and other cutting-edge products. Behind the scenes, making these systems fast, reliable, and cost-efficient requires world-class infrastructure.

The Caching Infrastructure team is responsible for building a caching layer that powers many critical use cases at OpenAI. We aim to provide a high-availability, multi-tenant cache platform that scales automatically with workload, minimizes tail latency, and supports a diverse range of use cases.

We’re looking for an experienced engineer to help design and scale this critical infrastructure. The ideal candidate has deep experience in distributed caching systems (e.g., Redis, Memcached), networking fundamentals, and Kubernetes-based service orchestration.

Responsibilities

Design, build, and operate OpenAI’s multi-tenant caching platform used across inference, identity, quota, and product experiences.
Define the long-term vision and roadmap for caching as a core infra capability, balancing performance, durability, and cost.
Collaborate with other infra teams (networking, observability, databases) and product teams to ensure the caching platform meets their needs.

Requirements

5+ years of experience building and scaling distributed systems, with a strong focus on caching, load balancing, or storage systems.
Deep expertise with Redis, Memcached, or similar solutions, including clustering, durability configurations, client-side connection patterns, and performance tuning.
Production experience with Kubernetes, service meshes (e.g., Envoy), and autoscaling systems.
Strong understanding of networking fundamentals and architecture trade-offs related to latency, reliability, throughput, and cost.
Experience designing and operating multi-tenant, high-availability platforms.
Ability to work in a fast-paced environment and balance pragmatic engineering with long-term technical excellence.

About OpenAI

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of AI system capabilities and seek to safely deploy them to the world through our products. OpenAI is an equal opportunity employer and provides reasonable accommodations to applicants with disabilities.

Background checks for applicants will be administered in accordance with applicable law. Additional policy and compliance links are provided in the original posting.

Benefits

Base pay in the listed range (see job posting) plus generous equity and performance-related bonuses for eligible employees.
Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts.
Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses.
401(k) retirement plan with employer match.
Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks).
Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees.
13+ paid company holidays and additional coordinated office closures.
Mental health and wellness support; employer-paid basic life and disability coverage.
Annual learning and development stipend; daily meals in offices and meal delivery credits as eligible.
Relocation support for eligible employees.
Additional taxable fringe benefits (charitable donation matching, wellness stipends) as applicable.