Software Engineer, Applied Evals

at OpenAI
USD 255,000-325,000 per year
MIDDLE
✅ Hybrid
✅ Relocation

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Hiring @ 3 Communication @ 3

Details

Applied Evals defines what good looks like for safe, advanced AI systems. The team turns complex, high-value workflows into clear, reproducible signals that guide model training and product quality. Work combines hands-on, unscalable efforts with systems that others can extend, creating a compounding loop of model improvement.

Role summary

We are hiring product-minded engineers to design and build evals and harnesses that capture real-world quality for advanced AI systems. You will own the loop from prototyping with users to building reliable pipelines and integrating signals into training stacks. The role spans the stack from backend pipelines to user-facing interfaces, including evaluating multi-turn and tool-using systems, designing agent harnesses, and applying reinforcement learning and related methods in production settings. Engineers who succeed operate like founders or founding engineers, take initiative, move quickly, and create structure where none exists.

This role is based in OpenAI's San Francisco HQ and uses a hybrid work model (3 days in office per week). Relocation assistance is offered for eligible employees.

Responsibilities

  • Define core evaluation signals that drive model improvement, turning vague product gaps into crisp, defensible measures of quality
  • Design agents, harnesses, and eval pipelines that are reliable, reproducible, and extendable
  • Prototype solutions with real workflows and convert them into scalable feedback loops
  • Connect evaluation signals directly to research and training systems so product improvements show up in user experience
  • Shape model interaction paradigms by partnering with engineering, research, and product teams on how models are deployed and measured
  • Build reusable systems and tools that enable contributions across the company and raise the quality bar

Requirements

  • 4+ years of experience in software engineering with strong fundamentals and a track record of shipping production systems end-to-end
  • Experience building AI agents or applications, including designing evals and improving performance through prompting or scaffolding
  • Familiarity with evaluation methods for large language models and patterns like multi-agent workflows, tool use, or long context
  • Familiarity with deep learning concepts or prior exposure to training models
  • Experience across the stack (backend pipelines to user-facing interfaces) and applying reinforcement learning or related methods in production settings
  • Clear communication across technical and non-technical audiences and ability to collaborate with research and product teams
  • Comfortable working in ambiguous, high-impact environments and iterating on solutions with users and stakeholders

Benefits & other details

  • Base pay range listed: $255,000 - $325,000 (offers equity)
  • Medical, dental, and vision insurance with employer contributions to Health Savings Accounts
  • Pre-tax accounts (Health FSA, Dependent Care FSA, commuter expenses)
  • 401(k) retirement plan with employer match
  • Paid parental leave and paid medical/caregiver leave; flexible PTO for exempt employees
  • 13+ paid company holidays and other paid company office closures; paid sick or safe time as required by law
  • Mental health and wellness support; employer-paid basic life and disability coverage
  • Annual learning and development stipend; daily meals in offices and meal delivery credits as eligible
  • Relocation support for eligible employees

About OpenAI

OpenAI is an AI research and deployment company focused on ensuring general-purpose artificial intelligence benefits all of humanity. The company is an equal opportunity employer and provides reasonable accommodations to applicants with disabilities. Background checks will be administered in accordance with applicable law.