Hardware / Software CoDesign Engineer

at OpenAI
USD 342,000-555,000 per year
MIDDLE
✅ Hybrid
✅ Relocation

Used Tools & Technologies

LLM GPU

Required Skills & Competences

Python @ 6 Algorithms @ 3 Machine Learning @ 3 Communication @ 3 Networking @ 5 Compliance @ 3 CUDA @ 3 Deep Learning @ 2 AI @ 3

Details

OpenAI’s Hardware organization develops silicon and system-level solutions for advanced AI workloads. The hardware optimization and co-design team works closely with software and research partners to co-design hardware tightly integrated with AI models, deliver production-grade silicon for supercomputing infrastructure, and create custom design tools and methodologies optimized for AI.

About the Role

As an engineer on the hardware optimization and co-design team you will co-design future hardware from different vendors for programmability and performance. You will collaborate with kernel, compiler, and machine learning engineers to understand needs related to ML techniques, algorithms, numerical approximations, programming expressivity, and compiler optimizations. You will evangelize these constraints to vendors, influence future hardware architectures for efficient training and inference, simulate workloads at different abstractions, and optimize system- and rack-wide networking and memory/compute hierarchies.

This role is based in San Francisco, CA. The team uses a hybrid work model (3 days in office per week). OpenAI offers relocation assistance to new employees.

Responsibilities

  • Co-design future hardware for programmability and performance with hardware vendors
  • Assist hardware vendors in developing optimal kernels and add support for them in OpenAI's compiler
  • Develop performance estimates for critical kernels across hardware configurations and drive decisions on compute core and memory hierarchy features
  • Build system performance models at different abstraction levels and analyze scale-up, scale-out, and front-end networking decisions
  • Work with machine learning engineers, kernel engineers, and compiler developers to understand their vision and needs from high-performance accelerators
  • Manage communication and coordination with internal and external partners
  • Influence hardware partner roadmaps to optimize for OpenAI workloads
  • Evaluate potential partners’ accelerators and platforms
  • As the team grows, influence roadmaps for datacenter networks, racks, and buildings

Requirements

  • 4+ years of industry experience, including experience harnessing compute at scale and optimizing ML platform code to run efficiently on target hardware
  • Strong experience in software/hardware co-design
  • Deep understanding of GPUs and/or other AI accelerators
  • Experience with CUDA, Triton, or a related accelerator programming language
  • Experience driving machine learning accuracy with low-precision formats
  • Experience with system performance modeling and analysis to optimize ML model deployment
  • Strong coding skills in C/C++ and Python
  • Familiarity with fundamentals of deep learning computing and chip architecture/microarchitecture
  • Ability to actively collaborate with ML engineers, kernel writers, compiler developers, system engineers, and chip architects/microarchitects

Nice to Have

  • PhD in Computer Science or Engineering with specialization in Computer Architecture, Parallel Computing, Compilers, or other systems
  • Strong understanding of LLMs and challenges related to their training and inference

Benefits and Perks

  • Medical, dental, and vision insurance for employees and families
  • Mental health and wellness support
  • 401(k) plan with employer match
  • Unlimited time off and 18+ company holidays per year
  • Paid parental leave (20 weeks) and family-planning support
  • Annual learning & development stipend ($1,500 per year)
  • Daily meals in offices and meal delivery credits as eligible
  • Relocation support for eligible employees

Compliance and Other Notes

  • Candidates may need to meet certain legal status requirements to comply with U.S. export control laws and regulations
  • Background checks will be administered in accordance with applicable law
  • OpenAI is an equal opportunity employer and provides reasonable accommodations to applicants with disabilities.