Hardware / Software CoDesign Engineer

at OpenAI

📍 San Francisco, United States

USD 342,000-555,000 per year

MIDDLE

✅ Hybrid

✅ Relocation

Used Tools & Technologies

LLM GPU

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Python @ 6 Algorithms @ 3 Machine Learning @ 3 Communication @ 3 Networking @ 5 Compliance @ 3 CUDA @ 3 Deep Learning @ 2 AI @ 3

Details

OpenAI’s Hardware organization develops silicon and system-level solutions for advanced AI workloads. The hardware optimization and co-design team works closely with software and research partners to co-design hardware tightly integrated with AI models, deliver production-grade silicon for supercomputing infrastructure, and create custom design tools and methodologies optimized for AI.

About the Role

As an engineer on the hardware optimization and co-design team you will co-design future hardware from different vendors for programmability and performance. You will collaborate with kernel, compiler, and machine learning engineers to understand needs related to ML techniques, algorithms, numerical approximations, programming expressivity, and compiler optimizations. You will evangelize these constraints to vendors, influence future hardware architectures for efficient training and inference, simulate workloads at different abstractions, and optimize system- and rack-wide networking and memory/compute hierarchies.

This role is based in San Francisco, CA. The team uses a hybrid work model (3 days in office per week). OpenAI offers relocation assistance to new employees.

Responsibilities

Co-design future hardware for programmability and performance with hardware vendors
Assist hardware vendors in developing optimal kernels and add support for them in OpenAI's compiler
Develop performance estimates for critical kernels across hardware configurations and drive decisions on compute core and memory hierarchy features
Build system performance models at different abstraction levels and analyze scale-up, scale-out, and front-end networking decisions
Work with machine learning engineers, kernel engineers, and compiler developers to understand their vision and needs from high-performance accelerators
Manage communication and coordination with internal and external partners
Influence hardware partner roadmaps to optimize for OpenAI workloads
Evaluate potential partners’ accelerators and platforms
As the team grows, influence roadmaps for datacenter networks, racks, and buildings

Requirements

4+ years of industry experience, including experience harnessing compute at scale and optimizing ML platform code to run efficiently on target hardware
Strong experience in software/hardware co-design
Deep understanding of GPUs and/or other AI accelerators
Experience with CUDA, Triton, or a related accelerator programming language
Experience driving machine learning accuracy with low-precision formats
Experience with system performance modeling and analysis to optimize ML model deployment
Strong coding skills in C/C++ and Python
Familiarity with fundamentals of deep learning computing and chip architecture/microarchitecture
Ability to actively collaborate with ML engineers, kernel writers, compiler developers, system engineers, and chip architects/microarchitects

Nice to Have

PhD in Computer Science or Engineering with specialization in Computer Architecture, Parallel Computing, Compilers, or other systems
Strong understanding of LLMs and challenges related to their training and inference

Benefits and Perks

Medical, dental, and vision insurance for employees and families
Mental health and wellness support
401(k) plan with employer match
Unlimited time off and 18+ company holidays per year
Paid parental leave (20 weeks) and family-planning support
Annual learning & development stipend ($1,500 per year)
Daily meals in offices and meal delivery credits as eligible
Relocation support for eligible employees

Compliance and Other Notes

Candidates may need to meet certain legal status requirements to comply with U.S. export control laws and regulations
Background checks will be administered in accordance with applicable law
OpenAI is an equal opportunity employer and provides reasonable accommodations to applicants with disabilities.