Researcher: Agent Post-Training, API & Power-Users

at OpenAI

📍 San Francisco, United States

USD 295,000-445,000 per year

MIDDLE

✅ Hybrid

✅ Relocation

Used Tools & Technologies

LLM

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Statistics @ 6 Machine Learning @ 6 API @ 3 ChatGPT @ 3 Codex @ 3 Observability @ 3 AI @ 3

Details

About the team

The Agent Post-Training team creates the frontier agents OpenAI ships to the world. The team trains the models behind agents in Codex, ChatGPT, the API, and other frontier products: persistent, proactive intelligence that can operate computers, collaborate with people and other agents, and expand what people and organizations can imagine, attempt, and achieve. The team's work spans coding, tool use, computer use, multi-agent coordination, long-horizon execution, factuality, instruction following, calibrated reasoning, and taste. The team builds the data, environments, graders, training methods, and feedback loops that shape OpenAI's next agents and carries those capabilities through major training runs and into products.

Role overview

As a member of the API & power-users team, you will improve the capabilities, reliability, and product fit of OpenAI’s agentic models for power users and API developers. Responsibilities include designing evals from real developer workflows, building training environments around production-like tool use, turning qualitative model failures into training data/evals/post-training interventions, and driving behavior improvements from discovery through post-training, integration, and launch. You will work across research, engineering, data, evals, and product and partner closely with researchers, engineers, API/product teams, Codex, infrastructure, and safety/alignment partners.

Responsibilities

Design and run experiments that improve model behavior in API and power-user workflows (function calling, tool use, coding, planning, long-horizon execution, factuality, instruction following, error recovery, calibrated reasoning).
Build evals, graders, and environments from real developer and power-user workflows; convert observed failures into training data, model-behavior hypotheses, and shipped improvements.
Partner with API and power-users to identify high-leverage behavior gaps and convert product signals into post-training interventions.
Improve model behavior when composed into systems: reliable tool use, respecting developer intent, handling partial failures, asking for clarification, and maintaining coherence across multi-step tasks.
Own end-to-end model behavior projects from qualitative failure analysis through data generation, training experiments, eval design, integration into major runs, and launch readiness.
Develop feedback loops using power-user traces, API usage patterns, and production-like environments to discover agentic model failures and gaps.
Decide which agentic capabilities, behavioral fixes, and partner-team integrations are ready for inclusion in major model runs.
Debug hard failures in shipped or near-shipped models by moving between traces, evals, training data, model outputs, and product context.
Work on early-training and alignment interventions, including data mixtures, objectives, synthetic data, and eval loops that shape downstream agent behavior.
Improve machinery for large-scale training and launch: experiment velocity, reliability, observability, reproducibility, cost, latency, and production readiness.
Take on cross-functional projects touching model training, product infrastructure, and the production agent harness (e.g., multi-agent systems or training against production-like environments).

Requirements

Strong technical fundamentals in machine learning, software engineering, systems, statistics, or applied research, and the ability to learn quickly across unfamiliar parts of the stack.
Hands-on experience with LLMs, post-training methods, RL / RLHF / RLAIF, evals, graders, synthetic data, coding agents, tool-using agents, API products, or production ML systems.
Strong taste for model behavior: ability to read transcripts, traces, eval failures, or API interactions and form concrete hypotheses about what the model needs to learn.
Comfortable turning ambiguous model behavior problems into concrete progress using data, training, evals, product changes, or combinations of approaches.
Ability to work across research, product, infrastructure, data, evals, and safety boundaries and to communicate clearly with each group.
Comfortable building load-bearing systems and processes when needed.

Benefits & Compensation

Estimated base salary range: $295,000 - $445,000 (Offers equity). Total compensation may include equity and performance-related bonuses.
Medical, dental, and vision insurance with employer contributions to Health Savings Accounts.
Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses.
401(k) retirement plan with employer match.
Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks).
Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees.
13+ paid company holidays and multiple coordinated office closures, plus paid sick or safe time as required by applicable law.
Mental health and wellness support; employer-paid basic life and disability coverage.
Annual learning and development stipend.
Daily meals in offices and meal delivery credits as eligible.
Relocation support for eligible employees.
Additional taxable fringe benefits (charitable donation matching, wellness stipends) may be provided.

About OpenAI

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. The company emphasizes safety, inclusion, and lawful background checks where applicable. OpenAI is an equal opportunity employer and provides reasonable accommodations to applicants with disabilities.