Full-Stack Software Engineer, Compute Foundations

at OpenAI
USD 230,000-347,000 per year
MIDDLE
✅ On-site
✅ Relocation

Used Tools & Technologies

Machine Learning

Required Skills & Competences

Security @ 3 Docker @ 3 Go @ 3 Kubernetes @ 3 Python @ 3 Distributed Systems @ 3 React @ 3 Angular @ 3 Node.js @ 3 API @ 3 Observability @ 3 AI @ 3

Details

The Frontier Clusters team at OpenAI builds, launches, and supports some of the largest supercomputers in the world to enable frontier model training. The team debugs issues across hardware and software, builds tools to operate clusters, and partners with researchers to improve reliability and scale of training runs. This role focuses on building web-based tools and backend services that surface cluster health, job failures, usable capacity, and operational workflows to researchers and operators.

Responsibilities

  • Build full-stack web applications that help researchers and operators understand cluster health, job failures, and usable capacity in real time.
  • Turn operational questions (e.g., “why is this job down?” or “what is preventing these nodes from being ready?”) into clear, actionable product experiences.
  • Collaborate closely with researchers and infrastructure teams to identify high-leverage operational problems and design solutions.
  • Design data models, APIs, and visualizations for inspecting large-scale scheduling, resource allocation, and cluster behavior.
  • Develop scalable backend services that process high-volume cluster and workload data with low latency and high reliability.
  • Build intuitive frontend workflows that connect users to underlying scheduling, storage, and compute systems.
  • Improve reliability, performance, and security of systems used to operate frontier compute.

Requirements

  • Significant experience in full-stack development, including modern frontend frameworks (React, Vue, or Angular) and backend technologies (Python, Go, or Node.js).
  • Experience building scalable, high-performance web applications for complex distributed systems.
  • Understanding of APIs, distributed data systems, and cloud infrastructure.
  • Execution-focused with strong attention to usability, performance, and scalability.
  • Comfortable working in fast-paced, collaborative environments with tight timelines and evolving priorities.

Bonus / Nice-to-have

  • Experience with Kubernetes, Docker, and cloud-native application deployment.
  • Understanding of AI/ML workload scheduling and orchestration challenges.
  • Experience with real-time data processing, visualization libraries, and observability tooling.

Compensation

  • Compensation Range: $230K - $347K USD

About OpenAI

OpenAI is an AI research and deployment company focused on ensuring that general-purpose artificial intelligence benefits all of humanity. The company emphasizes safety, diverse perspectives, and provides benefits including medical/dental/vision, retirement match, parental leave, paid time off, and other employee programs. Relocation support is provided for eligible employees.