Software Engineer, Applied Evals

at OpenAI

📍 San Francisco, United States

USD 255,000-325,000 per year

MIDDLE

✅ Hybrid

✅ Relocation

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Hiring @ 3 Communication @ 3

Details

Applied Evals defines what good looks like for safe, advanced AI systems. The team turns complex, high-value workflows into clear, reproducible signals that guide model training and product quality. Work combines hands-on, unscalable efforts with systems that others can extend, creating a compounding loop of model improvement.

Role summary

We are hiring product-minded engineers to design and build evals and harnesses that capture real-world quality for advanced AI systems. You will own the loop from prototyping with users to building reliable pipelines and integrating signals into training stacks. The role spans the stack from backend pipelines to user-facing interfaces, including evaluating multi-turn and tool-using systems, designing agent harnesses, and applying reinforcement learning and related methods in production settings. Engineers who succeed operate like founders or founding engineers, take initiative, move quickly, and create structure where none exists.

This role is based in OpenAI's San Francisco HQ and uses a hybrid work model (3 days in office per week). Relocation assistance is offered for eligible employees.

Responsibilities

Define core evaluation signals that drive model improvement, turning vague product gaps into crisp, defensible measures of quality
Design agents, harnesses, and eval pipelines that are reliable, reproducible, and extendable
Prototype solutions with real workflows and convert them into scalable feedback loops
Connect evaluation signals directly to research and training systems so product improvements show up in user experience
Shape model interaction paradigms by partnering with engineering, research, and product teams on how models are deployed and measured
Build reusable systems and tools that enable contributions across the company and raise the quality bar

Requirements

4+ years of experience in software engineering with strong fundamentals and a track record of shipping production systems end-to-end
Experience building AI agents or applications, including designing evals and improving performance through prompting or scaffolding
Familiarity with evaluation methods for large language models and patterns like multi-agent workflows, tool use, or long context
Familiarity with deep learning concepts or prior exposure to training models
Experience across the stack (backend pipelines to user-facing interfaces) and applying reinforcement learning or related methods in production settings
Clear communication across technical and non-technical audiences and ability to collaborate with research and product teams
Comfortable working in ambiguous, high-impact environments and iterating on solutions with users and stakeholders

Benefits & other details

Base pay range listed: $255,000 - $325,000 (offers equity)
Medical, dental, and vision insurance with employer contributions to Health Savings Accounts
Pre-tax accounts (Health FSA, Dependent Care FSA, commuter expenses)
401(k) retirement plan with employer match
Paid parental leave and paid medical/caregiver leave; flexible PTO for exempt employees
13+ paid company holidays and other paid company office closures; paid sick or safe time as required by law
Mental health and wellness support; employer-paid basic life and disability coverage
Annual learning and development stipend; daily meals in offices and meal delivery credits as eligible
Relocation support for eligible employees

About OpenAI

OpenAI is an AI research and deployment company focused on ensuring general-purpose artificial intelligence benefits all of humanity. The company is an equal opportunity employer and provides reasonable accommodations to applicants with disabilities. Background checks will be administered in accordance with applicable law.