Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Hiring @ 3
Communication @ 3
LLM @ 3
ChatGPT @ 3
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
The Frontier Evals & Environments team builds north-star model environments to drive progress towards safe AGI/ASI. The team creates ambitious environments to measure and steer models, and builds self-improvement loops to steer training, safety, and launch decisions. The team has open-sourced evaluations such as GDPval, SWE-bench Verified, MLE-bench, PaperBench, and SWE-Lancer, and has built and run frontier evaluations for GPT4o, o1, o3, GPT 4.5, ChatGPT Agent, and GPT5.
Responsibilities
- Create ambitious RL environments to push models to their limits.
- Measure frontier model capabilities, skills, and behaviors.
- Develop methodologies for automatically exploring model behavior.
- Help steer training for large training runs and inform future directions.
- Design scalable systems and processes to support continuous evaluation.
- Build self-improvement loops to automate model understanding and evaluation.
Requirements
- Passionate and knowledgeable about AGI/ASI measurement.
- Strong engineering and statistical analysis skills.
- Ability to think creatively with a robust "red-teaming mindset."
- Experience in ML research engineering, stochastic systems, observability and monitoring, LLM-enabled applications, or another technical domain applicable to AI evaluations.
- Ability to scope and deliver projects end-to-end in a dynamic, fast-paced research environment.
Nice to have
- First-hand experience in red-teaming systems (computer systems or otherwise).
- Experience working cross-functionally.
- Excellent communication skills.
Benefits
- Base pay range listed: $200,000 - $370,000 (total compensation may include equity and performance-related bonuses).
- Medical, dental, and vision insurance with employer HSA contributions.
- Pre-tax accounts (Health FSA, Dependent Care FSA, commuter benefits).
- 401(k) retirement plan with employer match.
- Paid parental, medical, and caregiver leave.
- Paid time off and paid company holidays/office closures.
- Mental health and wellness support; employer-paid basic life and disability coverage.
- Annual learning and development stipend.
- Daily meals in offices and meal delivery credits as eligible.
- Relocation support for eligible employees.
- Additional taxable fringe benefits (e.g., charitable donation matching, wellness stipends).
About OpenAI
OpenAI is an AI research and deployment company focused on ensuring general-purpose AI benefits all of humanity. The company emphasizes safety and inclusive perspectives, provides reasonable accommodations to applicants with disabilities, and administers background checks in accordance with applicable law. More details about policies and benefits are provided to candidates during the hiring process.