Software Engineer, Productivity - Inference Runtime

at OpenAI
USD 230,000-385,000 per year
MIDDLE
✅ On-site
✅ Relocation

Used Tools & Technologies

Not specified

Required Skills & Competences

Python @ 3 CI/CD @ 3 Distributed Systems @ 3 Hiring @ 3 Debugging @ 3 API @ 3 ChatGPT @ 3 GPU @ 3 Codex @ 3 Observability @ 6

Details

We’re hiring a Developer Productivity engineer to support OpenAI’s Inference Runtime teams. These teams own systems responsible for serving models reliably, efficiently, and safely across Codex, ChatGPT, API, and internal research workloads. This role sits at the intersection of developer experience, CI/CD infrastructure, release engineering, production readiness, and inference systems reliability. You will work on tooling and operational foundations that support model launches, inference optimizations, cloud provider integrations, and large-scale deployments across a rapidly evolving inference stack.

Responsibilities

  • Improve systems that ensure inference engine releases are correct, performant, and regression-free by evolving tooling and infrastructure for deploy gate validation.
  • Bring rigor to release, validation, branching, and deployment processes across the inference stack.
  • Improve canary, async, and large-scale validation workflows for inference systems.
  • Harden CI, testing, and validation infrastructure so failures are actionable and trustworthy.
  • Reduce noisy or flaky failures caused by infrastructure instability, GPU scheduling, or test environment issues.
  • Build automation for failure triage, ownership detection, debugging, and escalation.
  • Partner closely with inference teams, research developer productivity, engine acceleration, and infrastructure teams to improve release quality and rollout safety.
  • Reduce developer friction in testing, debugging, and release workflows to enable engineers to move faster with confidence.

Requirements

  • Strong experience with CI/CD systems, testing infrastructure, release tooling, developer productivity, or large-scale build and validation systems.
  • Comfortable working in Python-heavy environments and debugging complex distributed systems; C++ experience is helpful but not required.
  • Experience or strong interest in improving observability, rollout safety, release automation, and developer self-service tooling.
  • Ability to harden systems that catch issues before they reach production, reduce noise from flaky or infra-related test failures, and automate triage and escalation workflows.
  • High ownership, strong developer empathy, and comfort operating in ambiguous, cross-functional areas without a fully predefined roadmap.
  • Excited to learn about large-scale inference systems; prior inference experience is not required.

Benefits

  • Base pay range listed separately; total compensation may include equity and performance-related bonuses.
  • Medical, dental, and vision insurance with employer contributions to Health Savings Accounts.
  • Pre-tax accounts (Health FSA, Dependent Care FSA, commuter expenses).
  • 401(k) retirement plan with employer match.
  • Paid parental leave and paid medical/caregiver leave; flexible PTO and paid company holidays.
  • Mental health and wellness support; employer-paid basic life and disability coverage.
  • Annual learning and development stipend; daily meals in offices and meal delivery credits as eligible.
  • Relocation support for eligible employees.
  • Additional fringe benefits (charitable donation matching, wellness stipends) as applicable.