Inference Technical Lead, On-Device Transformers

at OpenAI
USD 445,000 per year
SENIOR
✅ Hybrid
✅ Relocation

Used Tools & Technologies

Machine Learning GPU

Required Skills & Competences

CUDA @ 4 AI @ 4

Details

The Future of Computing Research team is an applied research team in the Consumer Devices group focused on developing new methods and models to support OpenAI's vision and mission of building AGI that benefits all of humanity. This role is based in San Francisco, CA and follows a hybrid model (four days a week in the office). Relocation assistance is offered to new employees.

Responsibilities

  • Evaluate and select silicon platforms (GPUs, NPUs, and specialized accelerators) for on-device and edge deployment of OpenAI models.
  • Work closely with research teams to co-design model architectures that meet real-world deployment constraints such as latency, memory, power, and bandwidth.
  • Analyze and model system performance, identifying tradeoffs between model design, memory hierarchy, compute throughput, and hardware capabilities.
  • Partner with hardware vendors and internal infrastructure teams to bring up new accelerators and ensure efficient execution of transformer workloads.
  • Build and lead a team of engineers responsible for implementing the low-level inference stack, including kernel development and runtime systems.
  • Turn nascent research capabilities into deployable capabilities by driving engineering across research and product boundaries.

Requirements

  • Experience evaluating or deploying workloads on GPUs, NPUs, or other specialized accelerators.
  • Understanding of the performance characteristics of transformer models, including attention, KV-cache behavior, and memory bandwidth requirements.
  • Experience designing or optimizing high-performance compute systems, such as inference engines, distributed runtimes, or hardware-aware ML pipelines.
  • Experience building or leading teams working on low-level performance-critical software such as CUDA kernels, compilers, or ML runtimes.
  • Ability to work closely with ML researchers and designers to translate research into production-ready on-device inference systems.

About the Team and Company

The team sits in the Consumer Products organization and collaborates with top ML researchers and design talent to push model capabilities. OpenAI is an AI research and deployment company focused on safely developing general-purpose artificial intelligence that benefits all of humanity. OpenAI is an equal opportunity employer and provides background checks and accommodations for applicants with disabilities.

Benefits

  • Base pay (listed): $445K. Offers equity.
  • Medical, dental, and vision insurance with employer contributions to Health Savings Accounts.
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses.
  • 401(k) retirement plan with employer match.
  • Paid parental leave and paid medical/caregiver leave.
  • Flexible PTO and paid company holidays/office closures.
  • Mental health and wellness support; employer-paid basic life and disability coverage.
  • Annual learning and development stipend, daily meals in offices, and meal delivery credits as eligible.
  • Relocation support for eligible employees.