RE/RS, Data Understanding (MM)
Used Tools & Technologies
Machine LearningRequired Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 β basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 β daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 β you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 β exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Reporting @ 3
Deep Learning @ 3
AI @ 3
Data Pipelines @ 3
- 1-2 β basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 β daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 β you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 β exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
About the Team
The Data Understanding team is responsible for creating high-quality datasets and their quantized representation for OpenAI. This includes synthesizing multimodal data, building VQ representations, and processing, filtering, deduplication, quality control, and tokenization so data can be used effectively in big model training runs.
About the Role
We are looking to advance how OpenAI prepares, curates, synthesizes, and understands multimodal data at scale. You will work on research and production problems such as synthesizing multimodal content (images, audio, and video) and their supervisions, improving noisy data pipelines, building better quality filters, using models to automate data preparation, and measuring whether changes in the dataset improve model performance.
Responsibilities
- Drive research and production work on multimodal data synthesis and curation for large-model training.
- Build and improve data pipelines: processing, filtering, deduplication, quality control, and tokenization at scale.
- Develop quantized representations (e.g., VQ representations) for multimodal data.
- Use models to automate aspects of data preparation and create/measure dataset changes for downstream model impact.
- Own and drive a research agenda from problem selection to long-running impact.
Requirements
- Strong track record of new or improved ML ideas shown through publications, projects, or applied research.
- Ability to own and drive research agendas and long-running projects to impact.
- Comfortable with empirical, collaborative research approaches.
Nice to Have
- Experience with multimodal learning (audio, vision, video), synthetic data, or data-centric ML.
- Thoughtfulness about AI impact, including privacy, provenance, and data quality.
- Experience building high-performance deep learning systems or large-scale data processing systems.
About OpenAI
OpenAI is an AI research and deployment company dedicated to ensuring general-purpose artificial intelligence benefits all of humanity. We emphasize safety and human needs in building AI systems and value diverse perspectives and experiences. OpenAI is an equal opportunity employer and provides information on applicant privacy, reasonable accommodations for applicants with disabilities, and processes for reporting non-compliant job postings.
Benefits
- Base pay range listed in the posting; total compensation may include equity and performance-related bonuses.
- Medical, dental, and vision insurance with employer contributions to Health Savings Accounts.
- Pre-tax accounts (Health FSA, Dependent Care FSA, commuter expenses).
- 401(k) retirement plan with employer match.
- Paid parental leave and paid medical/caregiver leave.
- Flexible PTO and paid company holidays and office closures.
- Mental health and wellness support; employer-paid basic life and disability coverage.
- Annual learning and development stipend; daily office meals and meal delivery credits as eligible.
- Relocation support for eligible employees.