Researcher, Computer Use - Agent Post-Training

at OpenAI

📍 San Francisco, United States

USD 250,000-380,000 per year

MIDDLE

✅ On-site

✅ Relocation

Used Tools & Technologies

LLM

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Statistics @ 6 Machine Learning @ 6 API @ 3 ChatGPT @ 3 Codex @ 3 Observability @ 3 Reinforcement Learning @ 3 Data Pipelines @ 3

Details

About the team

The Agent Post-Training team creates the frontier agents OpenAI ships to the world. The team trains models behind agents in Codex, ChatGPT, the API, and other frontier products: persistent, proactive intelligence that can operate computers, collaborate with people and other agents, and expand what people and organizations can imagine, attempt, and achieve. The team builds data, environments, graders, training methods, and feedback loops that shape what OpenAI's next agents can do and carries those capabilities through major training runs into products.

About the role

As a member of Agent Post-Training, Computer Use, you will teach models to operate computers. You will help train models that can navigate browsers and desktops, use tools and applications, reason through complex workflows, collaborate with users and other agents, and complete long-horizon tasks with reliability and judgment. The work sits at the intersection of frontier model training, product behavior, evaluation, and systems engineering and will directly shape computer-use capabilities shipped in OpenAI’s agents.

You will work with researchers, engineers, product teams, infrastructure teams, and safety/alignment partners to decide what should go into major model runs, measure outcomes, and ship improvements into products used by real people.

Responsibilities

Design and run experiments that improve agentic model behavior for complex computer use, including desktop and browser.
Own end-to-end improvements to the post-training stack: RL, data pipelines, graders, reward signals, evals, diagnostics, and model-behavior analysis.
Build evals and environments that expose the next set of model failures and turn failures into training data, product fixes, or new research directions.
Partner with Codex and ChatGPT product teams to translate product signal into model improvements.
Work on early-training and alignment interventions (data mixtures, objectives, synthetic data, eval loops) that shape downstream agent behavior.
Decide which integrations, capabilities, and fixes are ready for major model runs.
Improve machinery for large-scale training and launch (experiment velocity, reliability, observability, reproducibility, cost, latency, production readiness).
Take on cross-functional projects touching model training, product infrastructure, and the production agent harness (e.g., multi-agent systems, training against production-like environments).
Debug hard failures in shipped or near-shipped models and convert qualitative behavior into hypotheses, experiments, and fixes.

Requirements / Qualifications

Strong technical fundamentals in machine learning, software engineering, systems, statistics, or a related field, with ability to learn across unfamiliar areas.
Hands-on experience with LLMs, reinforcement learning (RL), RLHF/RLAIF, post-training, evals, graders, synthetic data, model training, coding agents, tool-using agents, or production ML systems.
Ability to define hypotheses, build pipelines, run models, analyze results, and decide next steps.
Comfort working across research, product, infrastructure, data, evals, and safety boundaries and communicating with each group.
Interest in product impact and model behavior (reliability, honesty, taste, ease of use).

Benefits and compensation

Compensation tier listed: $250K – $380K (offers equity).
Base pay may vary depending on market location, skills, and experience. Total compensation may include equity and performance-related bonuses for eligible employees.
Benefits include medical, dental, and vision insurance; employer contributions to Health Savings Accounts; pre-tax FSAs; 401(k) with employer match; paid parental and medical/caregiver leave; flexible PTO; 13+ paid company holidays; mental health and wellness support; employer-paid basic life and disability coverage; annual learning & development stipend; daily meals in offices and meal delivery credits; relocation support for eligible employees; and additional taxable fringe benefits (e.g., charitable donation matching, wellness stipends).
Background checks will be administered in accordance with applicable law. OpenAI is an equal opportunity employer and provides reasonable accommodations to applicants with disabilities.