Used Tools & Technologies
Not specified
Required Skills & Competences ?
Cumulus Linux @ 3 Go @ 2 Linux @ 3 Python @ 6 CI/CD @ 2 Networking @ 3 Rust @ 2 Debugging @ 3Details
OpenAI’s Hardware organization develops silicon and system-level solutions for advanced AI workloads. The team builds AI-native silicon and tightly co-designs hardware with software and research partners. This role focuses on bootstrapping and scaling the switching layer of OpenAI’s AI supercomputers by building and maintaining custom SONiC NOS images from scratch across the Linux kernel, switch ASIC SAI/SDKs, platform drivers, control-plane services, and orchestration layers.
This role is based in San Francisco, CA. The team uses a hybrid work model (3 days in the office per week) and offers relocation assistance to new employees.
Responsibilities
- Design, develop, and maintain custom SONiC NOS images for large-scale AI fabrics.
- Integrate and configure Linux kernel components, device drivers, switch ASIC SDKs, and SAI layers.
- Bring up new switch platforms (thermal/fan control, power monitoring, transceiver management, watchdogs, OSFP CMIS, LEDs, CPLDs, etc.).
- Extend and customize SONiC services for routing, telemetry, control-plane state, and distributed automation.
- Validate ASIC configurations, perform link bring-up, SerDes tuning, buffer profile adjustments, and establish performance baselines.
- Evaluate switch silicon SDK releases, track vendor deliverables, and define platform requirements with vendors and ASIC partners.
- Debug complex issues spanning kernel, platform drivers, SONiC containers, routing agents, orchestration services, hardware signals, and network topology.
- Integrate switches into fleet-wide monitoring, remote diagnostics, telemetry pipelines, and automated lifecycle workflows.
- Develop robust CI/build pipelines for reproducible NOS builds and controlled rollout across the fleet.
- Support factory bring-up and qualification through to mass deployment.
- Collaborate on architecting and deploying networking protocols and technologies to maximize performance and reliability at AI scale.
Requirements
- Proven experience with SONiC or comparable NOS stacks (FBOSS, Cumulus Linux, Arista EOS, Junos PFE-level integration, etc.).
- Experience updating OpenConfig gNMI interfaces and YANG data models.
- Strong background in the Linux kernel, network device drivers, and low-level OS internals.
- Experience integrating Broadcom / Marvell / NVIDIA / Intel ASIC SDKs and SAI implementations.
- Proficiency in C and C++; strong knowledge of Python. Familiarity with Rust or Go is a plus.
- Deep understanding of L2/L3 forwarding, ECMP, RoCE, BGP, QoS, PFC, buffer tuning, and telemetry.
- Hands-on experience with hardware platform bring-up and board-level debugging.
- Familiarity with CI/CD pipelines, distributed config/state management, and large-scale automation.
- Strong cross-functional problem solving in high-performance, distributed environments.
- Ability to lead teams and deliver projects end to end.
Benefits
- Base salary range: $310,000 – $460,000 (offers equity). Total compensation includes equity and performance-related bonuses for eligible employees.
- Medical, dental, and vision insurance with employer contributions to Health Savings Accounts; pre-tax accounts (FSA); 401(k) with employer match.
- Paid parental leave, paid medical and caregiver leave, flexible PTO, 13+ paid company holidays, and paid sick/safe time as required by law.
- Mental health and wellness support, employer-paid basic life and disability coverage, annual learning & development stipend.
- Daily meals in offices and meal delivery credits as eligible; additional taxable fringe benefits (charitable donation matching, wellness stipends).
- Relocation support for eligible employees.
Additional notes
- Background checks will be administered in accordance with applicable law. OpenAI is an equal opportunity employer and is committed to reasonable accommodations for applicants with disabilities.