People Research Data Scientist, AI Fairness & Bias

at OpenAI
USD 198,000-220,000 per year
MIDDLE
✅ Hybrid
✅ Relocation

Used Tools & Technologies

LLM GenAI

Required Skills & Competences

Security @ 3 Automated Testing @ 3 Python @ 5 SQL @ 5 R @ 5 Statistics @ 3 Data Science @ 3 Hiring @ 3 Experimentation @ 3 Reporting @ 3 Audit @ 3 Generative AI @ 3 AI @ 3 Agentic AI @ 3

Details

About the Team

OpenAI’s People team hires, engages, and retains world-class talent to safely build and deploy AGI that benefits all of humanity. The People Analytics team helps leaders make rigorous, evidence-based talent decisions and ensures that the systems supporting those decisions are valid, reliable, fair, and accountable.

About the Role

As a People Data Scientist focused on AI fairness and bias testing, you will help establish how OpenAI evaluates AI-assisted People systems and high-impact talent processes. You will design and conduct rigorous assessments to identify, measure, and mitigate potential bias across the lifecycle of models, agents, decision-support tools, and automated workflows. Your work will span the entire employee life-cycle (hiring, performance, promotion, employee development, workforce planning, etc.) and will evaluate both technical systems and the broader human–AI decision processes, including data quality, measurement validity, differential outcomes, human oversight, and unintended consequences. This role is preferred to be based in San Francisco, CA.

Responsibilities

  • Define and lead fairness and bias-testing strategies for AI-assisted People processes, models, agents, and decision-support systems from development through deployment and ongoing monitoring.
  • Design rigorous algorithmic audits and validation studies, including adverse-impact analysis, subgroup and intersectional evaluation, error-rate analysis, calibration, measurement invariance, reliability, criterion-related validity, and sensitivity testing.
  • Identify appropriate fairness criteria for each use case, evaluate tradeoffs among competing definitions of fairness, and document assumptions, limitations, and residual risks.
  • Evaluate end-to-end human–AI decision systems, including model outputs, user behavior, human overrides, escalation pathways, and whether AI assistance changes the quality, consistency, or equity of decisions.
  • Develop evaluation approaches for generative and agentic AI, including test-set design, counterfactual testing, behavioral evaluation, human-rating studies, robustness testing, and analysis of disparate performance across populations and contexts.
  • Investigate sources of observed disparities (data representation, label and measurement bias, proxy variables, model design, decision thresholds, workflow design, differential adoption/usage).
  • Partner with engineering, People Operations, Legal, Privacy, Security, and People Systems teams to recommend and evaluate mitigations (data improvements, model changes, threshold adjustments, workflow redesign, monitoring controls, additional human oversight).
  • Build scalable fairness-evaluation infrastructure: reusable datasets, automated validation pipelines, regression tests, monitoring systems, self-service tools, and standardized reporting.
  • Establish research and documentation standards for fairness test plans, dataset and model documentation, validation reports, limitations, monitoring plans, and decision records.
  • Translate complex findings into concise, decision-ready narratives for technical teams and senior leaders.

Requirements

  • Deep expertise in algorithmic fairness, bias measurement, responsible AI, psychometrics, applied statistics, or evaluation of high-impact decision systems.
  • Exceptional strength in research design, measurement, experimentation, causal inference, and statistical modeling.
  • Hands-on experience applying methods such as subgroup and intersectional analysis, adverse-impact testing, equalized-odds and equal-opportunity analysis, demographic-parity assessment, calibration analysis, counterfactual testing, measurement invariance, reliability analysis, and validation studies.
  • Strong judgment about the limitations of fairness metrics and ability to select appropriate measures for different decision contexts.
  • Experience evaluating machine-learning models, generative AI systems, agents, or human–AI workflows using quantitative and qualitative evidence.
  • High proficiency in Python or R and SQL, with experience working across complex, sensitive, and imperfect datasets.
  • Experience building reproducible evaluation pipelines, automated testing frameworks, analytical tools, monitoring systems, or governed research workflows.
  • Ability to distinguish statistical disparities from potential causes and to communicate findings without overstating certainty or making unsupported causal or legal conclusions.
  • Ability to work effectively with technical, operational, legal, privacy, and executive stakeholders and influence consequential decisions through evidence and sound judgment.
  • Strong attention to detail, intellectual humility, and commitment to developing AI systems and organizational processes that work well for people across different backgrounds.

Preferred Qualifications

  • Experience conducting fairness assessments, algorithmic audits, model-risk reviews, adverse-impact analyses, or validation studies in employment or other high-impact domains.
  • Familiarity with fairness and model-evaluation tools such as Fairlearn, AI Fairness 360, responsible-AI evaluation frameworks, explainability methods, or comparable internal tooling.
  • Experience evaluating large language models, generative AI systems, safety classifiers, or agentic workflows, including behavioral testing and human evaluation.
  • Experience with employment selection, talent assessment, psychometrics, or validation of hiring/performance/promotion/workforce decisions.
  • Familiarity with responsible-AI frameworks and emerging requirements related to automated employment decision systems, algorithmic auditing, data privacy, and AI governance.
  • Experience creating model cards, dataset documentation, fairness scorecards, audit reports, monitoring plans, or other review artifacts for high-impact systems.
  • Advanced degree in Quantitative Psychology, Computer Science, Statistics, Economics, Data Science, Behavioral Science, or related quantitative field; PhD preferred but not required.

About OpenAI

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of AI capabilities and seek to safely deploy them through our products. OpenAI is an equal opportunity employer and is committed to inclusive hiring and reasonable accommodations for applicants with disabilities.

Benefits

  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts.
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit).
  • 401(k) retirement plan with employer match.
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks).
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees.
  • 13+ paid company holidays and multiple paid coordinated company office closures throughout the year, plus paid sick or safe time as required by law.
  • Mental health and wellness support.
  • Employer-paid basic life and disability coverage.
  • Annual learning and development stipend.
  • Daily meals in offices and meal delivery credits as eligible.
  • Relocation support for eligible employees.
  • Additional taxable fringe benefits such as charitable donation matching and wellness stipends.