Technical Lead, Safety Research

at OpenAI
USD 460,000-555,000 per year
SENIOR
✅ Hybrid
✅ Relocation

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Machine Learning @ 4

Details

About the Team

The Safety Systems team is responsible for various safety work to ensure our best models can be safely deployed to the real world to benefit society. The Safety Research team aims to fundamentally advance our capabilities for precisely implementing robust, safe behavior in AI models and systems. As capabilities continue to advance, it is imperative that our approaches to safety continue to improve and scale to address evolving risks. This includes ensuring systems are robust to prevent harmful misuse and preventing misalignment that could cause harm. The team works on methods grounded in current models and methods that generalize to future systems, including exploratory research on methods to improve safety common sense and generalizable reasoning, developing new evaluations to elicit or detect misalignment or inner goals, and methods to support human oversight of long-running tasks.

About the Role

As a tech lead, you will be responsible for developing strategy in new directions to address potential harms from misalignment or significant mistakes. Key activities include:

  • Setting north star goals and milestones for new research directions, and developing challenging evaluations to track progress.
  • Personally driving or leading research in new exploratory directions to demonstrate feasibility and scalability of the approaches.
  • Working horizontally across safety research and related teams to ensure different technical approaches work together to achieve strong safety results.

This role is based in San Francisco, CA. The team uses a hybrid work model of 3 days in the office per week and offers relocation assistance to new employees.

Responsibilities

  • Set research directions and strategies to make AI systems safer, more aligned, and more robust.
  • Coordinate and collaborate with cross-functional teams, including research, T&S, policy, and related alignment teams to ensure AI meets high safety standards.
  • Actively evaluate and understand the safety of models and systems, identify areas of risk, and propose mitigation strategies.
  • Conduct state-of-the-art research on AI safety topics such as RLHF, adversarial training, robustness, and more.
  • Implement new methods in OpenAI's core model training and launch safety improvements in OpenAI's products.

Requirements

  • 4+ years of experience in the field of AI safety, especially areas like RLHF, adversarial training, robustness, fairness & biases.
  • Strong track record of practical research on safety and alignment, ideally in AI and large language models (LLMs), and experience leading large research efforts.
  • Hold a Ph.D. or other advanced degree in computer science, machine learning, or a related field.
  • Experience in safety work for AI model deployment.
  • In-depth understanding of deep learning research and/or strong engineering skills.
  • Team player who enjoys collaborative work environments.

Benefits

  • Competitive base pay (see salary range).
  • Generous equity and performance-related bonuses for eligible employees.
  • Medical, dental, and vision insurance with employer contributions to Health Savings Accounts.
  • Pre-tax accounts (Health FSA, Dependent Care FSA, commuter benefits).
  • 401(k) retirement plan with employer match.
  • Paid parental, medical, and caregiver leave; PTO and paid company holidays.
  • Mental health and wellness support; employer-paid basic life and disability coverage.
  • Annual learning and development stipend, daily meals in offices, meal delivery credits, and relocation support for eligible employees.
  • Background checks and reasonable accommodations provided as described in the posting.

Additional Details

  • Location: San Francisco, CA (hybrid; 3 days in office per week).
  • Relocation assistance is offered to new employees.
  • The role focuses on research and engineering to improve AI safety, with emphasis on RLHF, adversarial training, robustness, model training, evaluation design, and human oversight.