RE/RS, Data Understanding - Foundations
Used Tools & Technologies
Machine LearningRequired Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 β basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 β daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 β you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 β exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Deep Learning @ 3
AI @ 3
- 1-2 β basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 β daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 β you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 β exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
The Data Understanding team builds high-quality datasets and their quantized representations for large model training. Work includes synthesizing data, building VQ representations, processing, filtering, deduplication, quality control, and tokenization so data can be used effectively in web-scale pretraining. The role focuses on advancing how OpenAI builds and understands pretraining data at scale by treating data quality and curation as core research problems, developing methods to select, combine, and transform data, creating datasets that improve model capabilities, designing experiments to understand how data choices affect model learning and behavior, and translating research into scalable data processing pipelines.
Responsibilities
- Develop new methods to select, combine, and transform pretraining data at scale.
- Create and evaluate datasets that improve model capabilities.
- Design rigorous experiments to measure how data choices and interventions affect model learning and downstream behavior.
- Translate successful research into scalable data processing pipelines for web-scale model training.
- Own and drive a research agenda from problem selection through long-running work to impact.
Requirements
- Strong track record of new or improved ML ideas through publications, projects, or applied research.
- Experience working with frontier models and web-scale data.
- Familiarity with data synthesis, vector quantization (VQ) representations, tokenization, deduplication, filtering, and quality control for large-model pretraining.
- Ability to design and run rigorous empirical research and experiments.
- Experience building high-performance deep learning systems or large-scale data processing systems (listed as nice-to-have).
- Thoughtfulness about AI's impact, including privacy, provenance, and data quality (nice-to-have).
About OpenAI
OpenAI is an AI research and deployment company focused on ensuring general-purpose artificial intelligence benefits all of humanity. The company emphasizes safety, diverse perspectives, and equal employment opportunity. Background checks are administered in accordance with applicable law. OpenAI provides reasonable accommodations to applicants with disabilities.
Benefits
- Medical, dental, and vision insurance with employer contributions to Health Savings Accounts
- Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses
- 401(k) retirement plan with employer match
- Paid parental leave and paid medical/caregiver leave
- Flexible paid time off and paid company holidays
- Mental health and wellness support; employer-paid basic life and disability coverage
- Annual learning and development stipend
- Daily meals in offices and meal delivery credits as eligible
- Relocation support for eligible employees
- Additional taxable fringe benefits (charitable donation matching, wellness stipends)
- Total compensation includes salary, generous equity, and performance-related bonuses where eligible