TL, Research Inference

at OpenAI

📍 San Francisco, United States

USD 380,000-555,000 per year

MIDDLE

✅ On-site

✅ Relocation

Used Tools & Technologies

Not specified

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Distributed Systems @ 3 Communication @ 3 Debugging @ 3 GPU @ 3 Observability @ 3 AI @ 3 Profiling @ 3

Details

The Foundations team studies how model behavior changes as models, data, and compute scale. The team investigates interactions between model architecture, optimization, and training data and uses those insights to guide model design and training. This role builds systems that enable advanced AI models to run efficiently at scale, operating at the intersection of model research and systems engineering to translate architectural ideas into high-performance inference systems that expose tradeoffs in performance, memory, and scalability. This is a research-enabling systems role (not product-serving) focused on performance, correctness, and realism.

Responsibilities

Design and build high-performance inference runtimes for large-scale AI models, focusing on efficiency, reliability, and scalability.
Own and optimize core execution paths, including model execution, memory management, batching, and scheduling.
Develop and improve distributed inference across multiple GPUs, including parallelism strategies, communication patterns, and runtime coordination.
Implement and optimize inference-critical operators and kernels informed by real-world workloads.
Partner closely with research teams to ensure new model architectures are supported accurately and efficiently in inference systems.
Diagnose and resolve performance bottlenecks through profiling, benchmarking, and low-level debugging.
Contribute to observability, correctness, and reliability of large-scale AI systems.

Requirements / Qualifications

Experience building production inference systems (beyond training or ad-hoc model runs).
Comfort with GPU-centric performance engineering, including GPU memory behavior and latency/throughput tradeoffs.
Experience with multi-GPU or distributed systems involving batching, scheduling, or runtime coordination.
Ability to reason end-to-end about inference pipelines, from request handling through execution and output streaming.
Ability to understand research ideas and implement them within real system and performance constraints.
Experience solving large-scale, ambiguous systems problems and preference for hands-on technical ownership and execution.

Benefits

Base pay range listed for the role (see posting): $380K – $555K; total compensation may include equity and performance-related bonuses.
Medical, dental, and vision insurance for employees and families, with employer contributions to Health Savings Accounts.
Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses.
401(k) retirement plan with employer match.
Paid parental leave and medical/caregiver leave.
Paid time off (flexible PTO for exempt employees; up to 15 days annually for non-exempt employees), 13+ paid company holidays, and paid sick/safe time as required by law.
Mental health and wellness support; employer-paid basic life and disability coverage.
Annual learning and development stipend; daily meals in offices and meal delivery credits as eligible.
Relocation support for eligible employees.
Additional fringe benefits such as charitable donation matching and wellness stipends may be provided.