Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Security @ 3
Linux @ 3
Networking @ 3
Rust @ 6
AI @ 3
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
About the Team
OpenAI’s Hardware organization develops silicon and system-level solutions designed for the unique demands of advanced AI workloads. The team builds AI-native silicon and tightly co-designs hardware with software and research partners, delivering production-grade silicon for OpenAI’s supercomputing infrastructure and custom design tools and methodologies to accelerate hardware innovation.
Role overview
You will design, build, integrate, and validate low-level system software for the manageability and health of OpenAI's first-party AI hardware systems. Work spans BMC, Linux, firmware interfaces, automation infra, boot and recovery, hardware diagnostics, telemetry, host and platform drivers, network software interfaces, and manufacturing and fleet readiness. A major responsibility is owning the acceptance path for partner-delivered system software: defining requirements, reviewing code and artifacts, reproducing builds, building tests, pushing fixes, and producing launch-readiness evidence.
Responsibilities
- Design, develop, and maintain low-level firmware and system software for first-party AI hardware manageability, including BMC software, Redfish services, gNMI telemetry, firmware update and recovery flows, BIOS/UEFI interactions, platform drivers, and hardware diagnostics.
- Own integration and acceptance of partner and vendor software releases: requirements, code and artifact review, reproducible builds, CI, regression monitoring, version tracking, acceptance criteria, and launch-readiness evidence.
- Build and maintain automation and CI infrastructure for testing and managing systems in lab environments.
- Define and debug hardware management protocols across accelerators, host systems, management controllers, firmware, and platform services; work with interfaces such as I2C, SMBus, PMBus, PCIe, Ethernet, GPIO, UART, and JTAG.
- Build system health monitoring, telemetry, remote diagnostics, and recovery paths for lab, manufacturing partners, and production data centers.
- Develop validation and test automation for board bring-up, rack bring-up, qualification, manufacturing readiness, deployment readiness, and long-term reliability.
- Convert engineering releases into manufacturing-ready software recipes: images, versions, logs, limits, remediation mapping, provisioning hooks, secure artifact handling, and traceable data export.
- Debug complex production issues spanning hardware signals, BMC firmware, BIOS/UEFI, kernel drivers, platform services, network topology, PCIe behavior, power, thermals, boot, provisioning, and manufacturing tests.
- Partner with hardware, firmware, security, networking, infrastructure, manufacturing, operations, and external engineering teams to define software contracts and drive issues to closure.
- Produce architecture notes, runbooks, validation records, and decision documents to help teams reproduce, operate, and improve platforms.
Requirements
- 7+ years of hands-on experience (or equivalent demonstrated accomplishments) in low-level system software, embedded software, firmware, BMC software, platform software, device drivers, or hardware diagnostics.
- Strong programming skills in C, C++, Rust, or similar systems languages; experience building reliable software for real hardware.
- Experience with Linux-based hardware platforms, embedded Linux, OpenBMC, Redfish, BMCWeb, IPMI boundaries, BIOS/UEFI, bootloaders, firmware update systems, kernel drivers, RTOS, or fleet management software.
- Strong knowledge of hardware/software interfaces such as I2C, SMBus, PMBus, SPI, PCIe, Ethernet, USB, UART, GPIO, JTAG, power controllers, board-level debug tools, or protocol analyzers.
- Demonstrated ability to debug live hardware using logs, packet captures, firmware traces, bus captures, lab hosts, BMC journals, Linux tooling, and controlled experiments.
- Experience with hardware bring-up, manufacturing or qualification testing, system diagnostics, release validation, or deployment of high-performance compute, accelerator, server, networking, storage, or embedded platforms.
- Ability to reason across software, firmware, hardware, manufacturing, and operations boundaries; convert ambiguous problems into clear requirements, designs, tests, and decisions.
- Proven track record working with external vendors, manufacturing partners, or partner engineering teams to define deliverables and drive issues to closure.
- Familiarity with platform security topics such as secure boot, firmware signing, device provisioning, attestation, certificate handling, trusted update flows, or access-control design is a plus.
Location & Work Model
- Location: San Francisco, CA (Hybrid: 3 days/week onsite).
- Relocation assistance available.
Benefits
- Base pay range listed for this role; offers include equity.
- Medical, dental, and vision insurance with employer contributions to Health Savings Accounts.
- Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses.
- 401(k) retirement plan with employer match.
- Paid parental leave, paid medical and caregiver leave.
- Paid time off (flexible PTO for exempt employees) and paid company holidays / office closures.
- Mental health and wellness support; employer-paid basic life and disability coverage.
- Annual learning and development stipend; daily meals in offices and meal delivery credits as eligible.
- Relocation support for eligible employees and other taxable fringe benefits (charitable donation matching, wellness stipends).
Other notes
- Candidates may need to meet certain legal status requirements to comply with U.S. export control laws and regulations.
- OpenAI is an equal opportunity employer and provides reasonable accommodations to applicants with disabilities.