Software Engineer, Distributed Systems

at OpenAI
USD 250,000-460,000 per year
MIDDLE
βœ… Hybrid
βœ… Relocation

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Python @ 3 Distributed Systems @ 3 Performance Optimization @ 3 Rust @ 3 Debugging @ 3 API @ 3

Details

The Compute Runtime team builds low-level framework components to power ML training systems. The team focuses on building robust, scalable, high-performance components to support distributed training workloads, maximizing researcher and hardware productivity.

This role is based in San Francisco, CA. The team uses a hybrid work model (3 days in the office per week) and offers relocation assistance to new employees.

Responsibilities

  • Deliver APIs and orchestration for systems that run across thousands of machines, handling large volumes of moving and persisted data.
  • Design and implement easy-to-use, introspectable systems that support fast debugging and development cycles while scaling to large supercomputers with stability and high performance.
  • Profile and optimize compute and data pipelines to improve end-to-end performance.
  • Deploy the training framework to new supercomputers and rapidly respond to changing ML system architectures.
  • Work across the Python and Rust stack to implement and optimize components.

Requirements

  • Experience building large distributed systems.
  • Strong software engineering skills; proficiency in Python and Rust (or equivalent systems languages) is expected.
  • Experience with profiling, performance optimization, and high-performance I/O.
  • Comfort working on end-to-end systems, diagnosing performance bottlenecks, and designing for scale.
  • Ability to work in a fast-paced, evolving environment focused on ML training infrastructure and supercomputing deployments.

About the Team

The Compute Runtime team focuses on low-level runtime and framework components that power ML training at scale. The team prioritizes productivity for researchers and efficient use of hardware to accelerate progress toward advanced ML systems.

Benefits

  • Competitive base salary (range provided below) plus equity and potential performance bonus(es).
  • Medical, dental, and vision insurance with employer contributions to Health Savings Accounts.
  • Pre-tax accounts (Health FSA, Dependent Care FSA, commuter benefits).
  • 401(k) with employer match.
  • Paid parental, medical, and caregiver leave; paid time off and company holidays.
  • Mental health and wellness support; employer-paid life and disability coverage.
  • Annual learning and development stipend.
  • Daily meals in offices and meal delivery credits as eligible.
  • Relocation support for eligible employees.
  • Background checks will be administered in accordance with applicable law.

(Original posting includes additional OpenAI policies and equal employment opportunity statements.)