Software Engineer, Distributed Systems
SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
Python @ 3 Distributed Systems @ 3 Performance Optimization @ 3 Rust @ 3 Debugging @ 3 API @ 3Details
The Compute Runtime team builds low-level framework components to power ML training systems. The team focuses on building robust, scalable, high-performance components to support distributed training workloads, maximizing researcher and hardware productivity.
This role is based in San Francisco, CA. The team uses a hybrid work model (3 days in the office per week) and offers relocation assistance to new employees.
Responsibilities
- Deliver APIs and orchestration for systems that run across thousands of machines, handling large volumes of moving and persisted data.
- Design and implement easy-to-use, introspectable systems that support fast debugging and development cycles while scaling to large supercomputers with stability and high performance.
- Profile and optimize compute and data pipelines to improve end-to-end performance.
- Deploy the training framework to new supercomputers and rapidly respond to changing ML system architectures.
- Work across the Python and Rust stack to implement and optimize components.
Requirements
- Experience building large distributed systems.
- Strong software engineering skills; proficiency in Python and Rust (or equivalent systems languages) is expected.
- Experience with profiling, performance optimization, and high-performance I/O.
- Comfort working on end-to-end systems, diagnosing performance bottlenecks, and designing for scale.
- Ability to work in a fast-paced, evolving environment focused on ML training infrastructure and supercomputing deployments.
About the Team
The Compute Runtime team focuses on low-level runtime and framework components that power ML training at scale. The team prioritizes productivity for researchers and efficient use of hardware to accelerate progress toward advanced ML systems.
Benefits
- Competitive base salary (range provided below) plus equity and potential performance bonus(es).
- Medical, dental, and vision insurance with employer contributions to Health Savings Accounts.
- Pre-tax accounts (Health FSA, Dependent Care FSA, commuter benefits).
- 401(k) with employer match.
- Paid parental, medical, and caregiver leave; paid time off and company holidays.
- Mental health and wellness support; employer-paid life and disability coverage.
- Annual learning and development stipend.
- Daily meals in offices and meal delivery credits as eligible.
- Relocation support for eligible employees.
- Background checks will be administered in accordance with applicable law.
(Original posting includes additional OpenAI policies and equal employment opportunity statements.)