System Software Engineer, First-Party Hardware

at OpenAI
USD 266,000-445,000 per year
MIDDLE
✅ Hybrid
✅ Relocation

Used Tools & Technologies

Not specified

Required Skills & Competences

Security @ 3 Linux @ 3 Networking @ 3 Rust @ 6 AI @ 3

Details

About the Team

OpenAI’s Hardware organization develops silicon and system-level solutions designed for the unique demands of advanced AI workloads. The team builds AI-native silicon and tightly co-designs hardware with software and research partners, delivering production-grade silicon for OpenAI’s supercomputing infrastructure and custom design tools and methodologies to accelerate hardware innovation.

Role overview

You will design, build, integrate, and validate low-level system software for the manageability and health of OpenAI's first-party AI hardware systems. Work spans BMC, Linux, firmware interfaces, automation infra, boot and recovery, hardware diagnostics, telemetry, host and platform drivers, network software interfaces, and manufacturing and fleet readiness. A major responsibility is owning the acceptance path for partner-delivered system software: defining requirements, reviewing code and artifacts, reproducing builds, building tests, pushing fixes, and producing launch-readiness evidence.

Responsibilities

  • Design, develop, and maintain low-level firmware and system software for first-party AI hardware manageability, including BMC software, Redfish services, gNMI telemetry, firmware update and recovery flows, BIOS/UEFI interactions, platform drivers, and hardware diagnostics.
  • Own integration and acceptance of partner and vendor software releases: requirements, code and artifact review, reproducible builds, CI, regression monitoring, version tracking, acceptance criteria, and launch-readiness evidence.
  • Build and maintain automation and CI infrastructure for testing and managing systems in lab environments.
  • Define and debug hardware management protocols across accelerators, host systems, management controllers, firmware, and platform services; work with interfaces such as I2C, SMBus, PMBus, PCIe, Ethernet, GPIO, UART, and JTAG.
  • Build system health monitoring, telemetry, remote diagnostics, and recovery paths for lab, manufacturing partners, and production data centers.
  • Develop validation and test automation for board bring-up, rack bring-up, qualification, manufacturing readiness, deployment readiness, and long-term reliability.
  • Convert engineering releases into manufacturing-ready software recipes: images, versions, logs, limits, remediation mapping, provisioning hooks, secure artifact handling, and traceable data export.
  • Debug complex production issues spanning hardware signals, BMC firmware, BIOS/UEFI, kernel drivers, platform services, network topology, PCIe behavior, power, thermals, boot, provisioning, and manufacturing tests.
  • Partner with hardware, firmware, security, networking, infrastructure, manufacturing, operations, and external engineering teams to define software contracts and drive issues to closure.
  • Produce architecture notes, runbooks, validation records, and decision documents to help teams reproduce, operate, and improve platforms.

Requirements

  • 7+ years of hands-on experience (or equivalent demonstrated accomplishments) in low-level system software, embedded software, firmware, BMC software, platform software, device drivers, or hardware diagnostics.
  • Strong programming skills in C, C++, Rust, or similar systems languages; experience building reliable software for real hardware.
  • Experience with Linux-based hardware platforms, embedded Linux, OpenBMC, Redfish, BMCWeb, IPMI boundaries, BIOS/UEFI, bootloaders, firmware update systems, kernel drivers, RTOS, or fleet management software.
  • Strong knowledge of hardware/software interfaces such as I2C, SMBus, PMBus, SPI, PCIe, Ethernet, USB, UART, GPIO, JTAG, power controllers, board-level debug tools, or protocol analyzers.
  • Demonstrated ability to debug live hardware using logs, packet captures, firmware traces, bus captures, lab hosts, BMC journals, Linux tooling, and controlled experiments.
  • Experience with hardware bring-up, manufacturing or qualification testing, system diagnostics, release validation, or deployment of high-performance compute, accelerator, server, networking, storage, or embedded platforms.
  • Ability to reason across software, firmware, hardware, manufacturing, and operations boundaries; convert ambiguous problems into clear requirements, designs, tests, and decisions.
  • Proven track record working with external vendors, manufacturing partners, or partner engineering teams to define deliverables and drive issues to closure.
  • Familiarity with platform security topics such as secure boot, firmware signing, device provisioning, attestation, certificate handling, trusted update flows, or access-control design is a plus.

Location & Work Model

  • Location: San Francisco, CA (Hybrid: 3 days/week onsite).
  • Relocation assistance available.

Benefits

  • Base pay range listed for this role; offers include equity.
  • Medical, dental, and vision insurance with employer contributions to Health Savings Accounts.
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses.
  • 401(k) retirement plan with employer match.
  • Paid parental leave, paid medical and caregiver leave.
  • Paid time off (flexible PTO for exempt employees) and paid company holidays / office closures.
  • Mental health and wellness support; employer-paid basic life and disability coverage.
  • Annual learning and development stipend; daily meals in offices and meal delivery credits as eligible.
  • Relocation support for eligible employees and other taxable fringe benefits (charitable donation matching, wellness stipends).

Other notes

  • Candidates may need to meet certain legal status requirements to comply with U.S. export control laws and regulations.
  • OpenAI is an equal opportunity employer and provides reasonable accommodations to applicants with disabilities.