Researcher, Connectors - Agent Post-Training

at OpenAI

📍 San Francisco, United States

USD 250,000-380,000 per year

MIDDLE

✅ Hybrid

✅ Relocation

Used Tools & Technologies

LLM

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Statistics @ 6 GitHub @ 3 Machine Learning @ 6 Slack @ 3 API @ 3 ChatGPT @ 3 Salesforce @ 3 Codex @ 3 Observability @ 3 AI @ 3 Reinforcement Learning @ 3 Data Pipelines @ 3

Details

About the Team

The Agent Post-Training team creates the frontier agents OpenAI ships to the world. We train models behind agents in Codex, ChatGPT, the API, and other frontier products: persistent, proactive intelligence that can operate computers, collaborate with people and other agents, and expand what people and organizations can imagine, attempt, and achieve.

We define what the next generation of agents should be able to do, build the training signal that teaches those abilities, and run the experiments that make them real. Work spans coding, tool use, computer use, multi-agent coordination, long-horizon execution, factuality, instruction following, calibrated reasoning, and taste. The team builds data, environments, graders, training methods, and feedback loops that shape what OpenAI's next agents can do, and carries those capabilities through major training runs into products used by real people.

Responsibilities

Design and run experiments that improve agentic model behavior for complex software and plugins.
Own end-to-end improvements to the post-training stack, including reinforcement learning, data pipelines, graders, reward signals, evals, diagnostics, and model-behavior analysis.
Build evals and environments that expose the next set of model failures, then turn those failures into training data, product fixes, or new research directions.
Partner with Codex and ChatGPT product teams to translate product signal into model improvements.
Work on early-training and alignment interventions, including data mixtures, objectives, synthetic data, and eval loops that shape downstream agent behavior.
Decide which integrations, capabilities, and fixes are ready for inclusion in major model runs.
Improve the machinery for large-scale training and launch: experiment velocity, reliability, observability, reproducibility, cost, latency, and production readiness.
Take on cross-functional projects touching model training, product infrastructure, and the production agent harness (e.g., multi-agent systems or training against production-like environments).
Debug hard failures in shipped or near-shipped models and convert qualitative behavior into hypotheses, experiments, and fixes.

Requirements

Strong technical fundamentals in machine learning, software engineering, systems, statistics, or a related field, and ability to learn across unfamiliar areas quickly.
Hands-on experience with large language models (LLMs), reinforcement learning (RL), RLHF/RLAIF, post-training, evals, graders, synthetic data, model training, coding agents, tool-using agents, or production ML systems.
Comfort with open-ended problems where the path is unclear and the signal is noisy; ability to combine research taste and engineering execution.
Product-minded: care about model behavior and practical impact beyond benchmarks.
Ability to move from behavioral problems to concrete experiments: define hypotheses, build pipelines, run models, analyze results, and decide next steps.
Comfortable working across research, product, infrastructure, data, evals, and safety teams and communicating clearly with each group.
Willingness to build load-bearing systems and processes when needed.

Technologies and integrations mentioned

Codex, ChatGPT, API, Slack, Google Workspace, GitHub, Notion, Linear, Salesforce; topics include LLMs, RL, RLHF/RLAIF, post-training, evals, graders, synthetic data, data pipelines, reward signals, diagnostics, model-behavior analysis, multi-agent systems, and production ML systems.

Compensation

Compensation range: $250,000 - $380,000 USD. Total compensation may include equity and performance-related bonuses.

Benefits

Medical, dental, and vision insurance with employer contributions to Health Savings Accounts.
Pre-tax accounts (Health FSA, Dependent Care FSA, commuter accounts).
401(k) with employer match.
Paid parental, medical, and caregiver leave.
Paid time off (flexible PTO for exempt employees; up to 15 days annually for non-exempt employees), 13+ paid holidays, and paid sick/safe time as required by law.
Mental health and wellness support; employer-paid basic life and disability coverage.
Annual learning and development stipend.
Daily meals in offices and meal delivery credits as eligible.
Relocation support for eligible employees.
Additional fringe benefits (charitable donation matching, wellness stipends) may be provided.

About OpenAI

OpenAI is an AI research and deployment company dedicated to ensuring general-purpose AI benefits all of humanity. The company emphasizes safety, diversity, and equitable workplaces, and conducts background checks in accordance with applicable laws. Reasonable accommodations are provided to applicants with disabilities.