Inference Technical Lead, Sora

at OpenAI

📍 San Francisco, United States

USD 380,000 per year

SENIOR

✅ Hybrid

✅ Relocation

Used Tools & Technologies

Not specified

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Performance Optimization @ 4 GPU @ 4

Details

The Sora team is pioneering multimodal capabilities for OpenAI’s foundation models. The team is a hybrid research and product organization focused on integrating multimodal functionalities into AI products with an emphasis on reliability, usability, and broad societal benefit.

About the role

We’re looking for a GPU Inference Engineer to contribute to improvements in model serving efficiency for Sora. This is a high-impact role to drive initiatives that optimize inference performance and scalability. You will also be engaged in model design to help researchers develop inference-friendly models.

This role is based in San Francisco, CA and follows a hybrid model (3 days in office per week). OpenAI offers relocation assistance to new employees.

Responsibilities

Perform engineering work focused on improving model serving, inference performance, and system efficiency.
Drive optimizations from a kernel and data-movement perspective to improve system throughput and reliability.
Partner closely with research and product teams to ensure models perform effectively at scale.
Design, build, and improve critical serving infrastructure to support Sora’s growth and reliability needs.
Set technical direction, navigate ambiguity, and drive complex initiatives to completion.

Requirements

Deep expertise in model performance optimization, particularly at the inference layer.
Strong background in kernel-level systems, data movement, and low-level performance tuning.
Experience scaling high-performing AI systems that serve real-world multimodal workloads.
Ability to collaborate with researchers and product teams and to lead technical initiatives.

Benefits

Base pay around $380K (role summary provided). Total compensation may include equity and performance-related bonuses.
Medical, dental, and vision insurance with employer contributions to Health Savings Accounts.
Pre-tax accounts (Health FSA, Dependent Care FSA, commuter benefits).
401(k) with employer match.
Paid parental, medical, and caregiver leave; flexible PTO and paid holidays.
Mental health and wellness support; employer-paid basic life and disability coverage.
Annual learning and development stipend.
Daily meals in offices and meal delivery credits as eligible.
Relocation support for eligible employees.

About OpenAI

OpenAI is an AI research and deployment company dedicated to ensuring general-purpose artificial intelligence benefits all of humanity. The company emphasizes safety, inclusion of diverse perspectives, and lawful background checks where applicable. OpenAI is an equal opportunity employer and provides reasonable accommodations to applicants with disabilities.