Used Tools & Technologies
Not specified
Required Skills & Competences ?
Chef @ 3 Kubernetes @ 3 Linux @ 3 Terraform @ 3 CI/CD @ 3 Algorithms @ 3 Hiring @ 3 Networking @ 3Details
The Fleet team at OpenAI supports the computing environment that powers our cutting-edge research and product development. The team oversees large-scale systems that span data centers, GPUs, networking, and more, ensuring high availability, performance, and efficiency. Work prioritizes safety, reliability, and responsible AI deployment. This role is based in San Francisco, CA and uses a hybrid work model of 3 days in the office per week. Relocation assistance is offered to new employees.
Responsibilities
- Design and build systems to manage both cloud and bare-metal fleets at scale.
- Develop tools that integrate low-level hardware metrics with high-level job scheduling and cluster management algorithms.
- Leverage LLMs to coordinate vendor operations and optimize infrastructure workflows.
- Automate infrastructure processes to reduce repetitive toil and improve system reliability.
- Collaborate with hardware, infrastructure, and research teams to ensure seamless integration across the stack.
- Continuously improve tools, automation, processes, and documentation to enhance operational efficiency.
Requirements
- Strong software engineering skills with experience in large-scale infrastructure environments.
- Broad knowledge of cluster-level systems (examples given: Kubernetes, CI/CD pipelines, Terraform, cloud providers).
- Deep expertise in server-level systems (examples given: systemd, containerization, Chef, Linux kernels, firmware management, host routing).
- Passion for optimizing the performance and reliability of large compute fleets.
- Ability to thrive in dynamic environments and solve complex infrastructure challenges.
- Focus on automation, efficiency, and continuous improvement.
About OpenAI
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. The company emphasizes safety and human-centered development and is an equal opportunity employer. Background checks will be administered in accordance with applicable law. Reasonable accommodations for applicants with disabilities are available.
Benefits
- Base salary in the range listed for this role, plus generous equity and potential performance-related bonuses.
- Medical, dental, and vision insurance with employer contributions to Health Savings Accounts.
- Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses.
- 401(k) retirement plan with employer match.
- Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks).
- Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees.
- 13+ paid company holidays and multiple company office closures throughout the year, plus paid sick or safe time as required by local law.
- Mental health and wellness support; employer-paid basic life and disability coverage.
- Annual learning and development stipend.
- Daily meals in offices and meal delivery credits as eligible.
- Relocation support for eligible employees.
- Additional taxable fringe benefits such as charitable donation matching and wellness stipends.
For more details about benefits and policies, candidates are referred to OpenAI's published policy documents and the hiring process.