Datacenter Hardware Operations Technician, AI Compute Infrastructure - Stargate
at OpenAI
USD 164,000-268,000 per year
SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
System Administration @ 4 Linux @ 4 Prometheus @ 3 Communication @ 4 Oracle @ 4 GPU @ 4Details
OpenAI's Stargate program develops and deploys massive, state-of-the-art data center campuses in partnership with industry leaders. We design for scale, speed, and reliability, and we need experienced hardware professionals who can help ensure our high-density compute environment operates at peak performance.
Responsibilities
- Serve as OpenAI’s primary on-site hardware contact, collaborating with Oracle teams and vendors to plan and coordinate maintenance, repairs, and lifecycle activities.
- Share technical requirements and verify that work performed supports OpenAI’s compute needs and agreed quality targets.
- Coordinate schedules, spare-parts planning, and issue escalation with partner teams to minimize downtime and keep operations running smoothly.
- Work with OpenAI fleet-health engineers to translate software-detected issues into on-site hardware actions in partnership with Oracle.
- Track hardware trends and provide joint recommendations with partner teams for design or operational improvements.
- Prepare documentation and runbooks that capture joint best practices and can be applied at additional campuses.
- Offer technical guidance and context to partner personnel while respecting their operational ownership.
- Collaborate with supply-chain teams to plan spares and manage hardware lifecycle activities.
Requirements
- 7+ years of experience in datacenter hardware operations, hardware engineering, or large-scale server maintenance, with at least 2 years in a senior or lead technician capacity.
- Deep knowledge of high-density server hardware, including x86 platforms, GPUs, storage devices, and power/cooling systems.
- Strong ability to diagnose hardware issues, coordinate complex repairs, and maintain strong working relationships across organizations.
- Comfortable setting technical expectations and validating outcomes through collaboration (not direct management).
- Ability to adapt quickly to changing operational conditions and solve problems at both strategic and on-site levels.
- Excellent communication skills and ability to build trust across partner teams, vendors, and internal engineering stakeholders.
- Willingness and ability to be based full-time at a partner-operated campus. Candidates must be able to sit onsite in Abilene, Texas 5 days per week.
Preferred Skills
- Familiarity with large-scale cluster management or monitoring tools (IPMI, BMC, Prometheus, Nagios) to interpret alerts and coordinate partner responses.
- Experience with GPU-accelerated compute clusters or other high-performance computing hardware.
- Knowledge of Linux/Unix system administration and command-line diagnostic tools for hardware validation.
- Industry certifications such as CompTIA Server+, OEM hardware certifications, or equivalent.
- Experience applying Environmental Health and Safety best practices in mission-critical environments.
Benefits
- Base pay range: $164,000 - $268,000 (may vary by market location, individual experience, and other factors). Offers equity and may include performance-related bonus(es).
- Medical, dental, and vision insurance with employer contributions to Health Savings Accounts.
- Pre-tax accounts (Health FSA, Dependent Care FSA, commuter expenses).
- 401(k) retirement plan with employer match.
- Paid parental, medical, and caregiver leave; flexible PTO for exempt employees and up to 15 days annually for non-exempt employees.
- 13+ paid company holidays and additional coordinated office closures, plus paid sick or safe time as required by law.
- Mental health and wellness support, employer-paid basic life and disability coverage, annual learning and development stipend, relocation support for eligible employees, and other fringe benefits.
About OpenAI
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of AI capabilities and seek to safely deploy them through our products. OpenAI is an equal opportunity employer and is committed to providing reasonable accommodations to applicants with disabilities.