Used Tools & Technologies
Not specified
Required Skills & Competences ?
Security @ 3 Docker @ 3 Go @ 3 Kubernetes @ 3 Python @ 3 Distributed Systems @ 3 Azure @ 6 React @ 3 Angular @ 3 Node.js @ 3 API @ 6 GraphQL @ 6Details
Full Stack engineers within the Fleet Scheduling team build intuitive, scalable interfaces that empower researchers to efficiently manage AI workloads across some of the largest supercomputers in the world. The team focuses on high-performance systems that provide real-time insights, resource tracking, and seamless interaction with complex infrastructure to optimize resource allocation, minimize operational overhead, and enhance researcher productivity and system transparency.
About the Role
You will design, develop, and operate web-based systems that provide a powerful and intuitive interface to supercomputing clusters. You will collaborate closely with researchers, product, and infrastructure teams to deliver scalable solutions that enable seamless monitoring, job scheduling, and resource management. This role works at the cutting edge of AI infrastructure, designing tools that scale to exascale workloads while maintaining usability and performance.
This role is based in San Francisco, CA. The team uses a hybrid work model (3 days in the office per week) and offers relocation assistance to new employees.
Responsibilities
- Design and develop full-stack web applications to track, monitor, and manage large-scale AI workloads in real time.
- Collaborate with researchers and infrastructure teams to translate complex operational needs into intuitive UIs and scalable backends.
- Build data visualization tools (e.g., Gantt charts, dashboards) to provide insights into job scheduling and resource allocation.
- Optimize backend services to handle massive data throughput while ensuring low-latency performance and high availability.
- Implement frontend components that provide seamless interactions with scheduling, storage, and compute systems.
- Ensure system security, reliability, and scalability across globally distributed supercomputing infrastructure.
Requirements
- Significant experience in full-stack development, with expertise in modern frontend frameworks (React, Vue, or Angular) and backend technologies (Python, Go, or Node.js).
- Experience building scalable, high-performance web applications for complex distributed systems.
- Strong understanding of RESTful and GraphQL APIs, distributed databases, and cloud infrastructure (especially Azure).
- Execution-focused with a keen eye for usability, performance, and scalability in enterprise-scale systems.
- Comfortable working in fast-paced, highly collaborative environments with tight timelines and evolving priorities.
Nice to have / Bonus
- Experience with Kubernetes, Docker, and cloud-native application deployment.
- Understanding of AI/ML workload scheduling and orchestration challenges.
- Experience with real-time data processing, visualization libraries, and observability tooling.
Benefits
- Base pay in the range listed below; total compensation may include generous equity and performance-related bonuses.
- Medical, dental, and vision insurance with employer contributions to Health Savings Accounts.
- Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses.
- 401(k) retirement plan with employer match.
- Paid parental leave and paid medical/caregiver leave.
- Flexible PTO and paid company holidays/office closures.
- Mental health and wellness support; employer-paid basic life and disability coverage.
- Annual learning and development stipend.
- Daily meals in offices and meal delivery credits as eligible.
- Relocation support for eligible employees.
About OpenAI
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. The company is an equal opportunity employer and provides reasonable accommodations to applicants with disabilities. Background checks are administered in accordance with applicable law.