Software Engineer, Research Infrastructure

at OpenAI
USD 255,000-490,000 per year
MIDDLE
✅ Hybrid
✅ Relocation

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Kubernetes @ 3 CI/CD @ 3 Azure @ 3 GPU @ 3

Details

This role will support the fleet infrastructure team at OpenAI. The fleet team focuses on running the world’s largest, most reliable, and frictionless GPU fleet to support OpenAI’s general purpose model training and deployment. Work on this team includes maximizing GPU utilization through scheduling and quota systems, automation for Kubernetes cluster provisioning and upgrades, supporting research workflows with service frameworks and deployment systems, and ensuring fast model startup times via high-performance snapshot delivery across blob storage down to hardware caching.

Responsibilities

  • Design, implement, deploy, and operate components of the compute fleet, including job scheduling, cluster management, snapshot delivery, and CI/CD systems.
  • Interface with researchers and product teams to understand workload requirements.
  • Collaborate with hardware, infrastructure, and business teams to provide a high-utilization, high-reliability service.

Requirements

  • Experience with hyperscale compute systems.
  • Strong programming skills.
  • Experience working in public clouds (especially Azure).
  • Experience working with Kubernetes.
  • Execution-focused mentality with a rigorous focus on user requirements.
  • Bonus: understanding of AI/ML workloads.

Benefits

  • Competitive base pay and total compensation (range listed below), plus equity and performance-related bonuses for eligible employees.
  • Medical, dental, and vision insurance with employer contributions to Health Savings Accounts.
  • Pre-tax accounts (Health FSA, Dependent Care FSA, commuter expenses).
  • 401(k) retirement plan with employer match.
  • Paid parental leave and paid medical/caregiver leave.
  • Paid time off (flexible PTO for exempt employees; up to 15 days annually for non-exempt employees).
  • 13+ paid company holidays and additional coordinated office closures.
  • Mental health and wellness support; employer-paid basic life and disability coverage.
  • Annual learning and development stipend.
  • Daily meals in offices and meal delivery credits as eligible.
  • Relocation support for eligible employees.

Location & Work Model

  • This role is based in San Francisco, CA (United States).
  • Hybrid work model: 3 days in the office per week.
  • Relocation assistance is offered to new employees.

About OpenAI

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of AI capabilities and seek to safely deploy them through our products. OpenAI is an equal opportunity employer and provides reasonable accommodations to applicants with disabilities.