Inference Technical Lead, On-Device Transformers

at OpenAI

📍 San Francisco, United States

USD 445,000 per year

SENIOR

✅ Hybrid

✅ Relocation

Used Tools & Technologies

Machine Learning GPU

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

CUDA @ 4 AI @ 4

Details

The Future of Computing Research team is an applied research team in the Consumer Devices group focused on developing new methods and models to support OpenAI's vision and mission of building AGI that benefits all of humanity. This role is based in San Francisco, CA and follows a hybrid model (four days a week in the office). Relocation assistance is offered to new employees.

Responsibilities

Evaluate and select silicon platforms (GPUs, NPUs, and specialized accelerators) for on-device and edge deployment of OpenAI models.
Work closely with research teams to co-design model architectures that meet real-world deployment constraints such as latency, memory, power, and bandwidth.
Analyze and model system performance, identifying tradeoffs between model design, memory hierarchy, compute throughput, and hardware capabilities.
Partner with hardware vendors and internal infrastructure teams to bring up new accelerators and ensure efficient execution of transformer workloads.
Build and lead a team of engineers responsible for implementing the low-level inference stack, including kernel development and runtime systems.
Turn nascent research capabilities into deployable capabilities by driving engineering across research and product boundaries.

Requirements

Experience evaluating or deploying workloads on GPUs, NPUs, or other specialized accelerators.
Understanding of the performance characteristics of transformer models, including attention, KV-cache behavior, and memory bandwidth requirements.
Experience designing or optimizing high-performance compute systems, such as inference engines, distributed runtimes, or hardware-aware ML pipelines.
Experience building or leading teams working on low-level performance-critical software such as CUDA kernels, compilers, or ML runtimes.
Ability to work closely with ML researchers and designers to translate research into production-ready on-device inference systems.

About the Team and Company

The team sits in the Consumer Products organization and collaborates with top ML researchers and design talent to push model capabilities. OpenAI is an AI research and deployment company focused on safely developing general-purpose artificial intelligence that benefits all of humanity. OpenAI is an equal opportunity employer and provides background checks and accommodations for applicants with disabilities.

Benefits

Base pay (listed): $445K. Offers equity.
Medical, dental, and vision insurance with employer contributions to Health Savings Accounts.
Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses.
401(k) retirement plan with employer match.
Paid parental leave and paid medical/caregiver leave.
Flexible PTO and paid company holidays/office closures.
Mental health and wellness support; employer-paid basic life and disability coverage.
Annual learning and development stipend, daily meals in offices, and meal delivery credits as eligible.
Relocation support for eligible employees.