RE/RS, Data Understanding - Foundations

at OpenAI
USD 445,000-555,000 per year
MIDDLE
βœ… On-site
βœ… Relocation

Used Tools & Technologies

Machine Learning

Required Skills & Competences

Deep Learning @ 3 AI @ 3

Details

The Data Understanding team builds high-quality datasets and their quantized representations for large model training. Work includes synthesizing data, building VQ representations, processing, filtering, deduplication, quality control, and tokenization so data can be used effectively in web-scale pretraining. The role focuses on advancing how OpenAI builds and understands pretraining data at scale by treating data quality and curation as core research problems, developing methods to select, combine, and transform data, creating datasets that improve model capabilities, designing experiments to understand how data choices affect model learning and behavior, and translating research into scalable data processing pipelines.

Responsibilities

  • Develop new methods to select, combine, and transform pretraining data at scale.
  • Create and evaluate datasets that improve model capabilities.
  • Design rigorous experiments to measure how data choices and interventions affect model learning and downstream behavior.
  • Translate successful research into scalable data processing pipelines for web-scale model training.
  • Own and drive a research agenda from problem selection through long-running work to impact.

Requirements

  • Strong track record of new or improved ML ideas through publications, projects, or applied research.
  • Experience working with frontier models and web-scale data.
  • Familiarity with data synthesis, vector quantization (VQ) representations, tokenization, deduplication, filtering, and quality control for large-model pretraining.
  • Ability to design and run rigorous empirical research and experiments.
  • Experience building high-performance deep learning systems or large-scale data processing systems (listed as nice-to-have).
  • Thoughtfulness about AI's impact, including privacy, provenance, and data quality (nice-to-have).

About OpenAI

OpenAI is an AI research and deployment company focused on ensuring general-purpose artificial intelligence benefits all of humanity. The company emphasizes safety, diverse perspectives, and equal employment opportunity. Background checks are administered in accordance with applicable law. OpenAI provides reasonable accommodations to applicants with disabilities.

Benefits

  • Medical, dental, and vision insurance with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses
  • 401(k) retirement plan with employer match
  • Paid parental leave and paid medical/caregiver leave
  • Flexible paid time off and paid company holidays
  • Mental health and wellness support; employer-paid basic life and disability coverage
  • Annual learning and development stipend
  • Daily meals in offices and meal delivery credits as eligible
  • Relocation support for eligible employees
  • Additional taxable fringe benefits (charitable donation matching, wellness stipends)
  • Total compensation includes salary, generous equity, and performance-related bonuses where eligible