AI Systems Engineer, Codex Agents

at OpenAI

📍 San Francisco, United States

USD 230,000-385,000 per year

MIDDLE

✅ On-site

✅ Relocation

Used Tools & Technologies

Machine Learning

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Security @ 3 Python @ 3 Distributed Systems @ 3 Rust @ 3 Debugging @ 3 API @ 3 Experimentation @ 3 LLM @ 3 GPU @ 3 Codex @ 3 Observability @ 3 AI @ 3 Profiling @ 3

Details

About the Team

The Codex Core Agents team builds the agent harness that turns model capability into real-world action. The team owns systems around the model: prompting and interpreting model outputs, executing actions safely in real environments, and feeding production experience back into better models and better agent behavior. The work spans harness, model interaction, inference, sandboxed execution, orchestration, evals, production reliability, and the performance envelope around tokens, latency, cost, capacity, and quality. The harness is open source and increasingly part of how models are trained and evaluated.

About the Role

We are looking for engineers to build the AI systems that make Codex agents dependable in production. The ideal candidate is an agent-systems builder: hands-on across low-level systems and ML workflows, able to debug Codex behavior end-to-end across the harness, model behavior, inference/runtime stack, GPU fleet, and product surface. You will work with research, infrastructure, and product to design agent harness capabilities, run experiments and ablations across the model + system prompt + harness stack, build frameworks for assessing production agent performance, and turn messy failures into durable improvements.

Responsibilities

Design and build the core agent harness and execution loop that lets Codex agents interpret model outputs, use tools, execute code, and complete long-horizon tasks safely.
Build sandboxing, isolation, orchestration, state, and workflow infrastructure for agents operating in real development environments.
Develop evaluation, experimentation, and debugging systems that distinguish harness issues, model behavior, inference/runtime issues, and product failures.
Run ablations across prompts, model-facing interfaces, context construction, tool-use strategies, and harness behavior to improve solve rate, reliability, latency, and cost.
Improve observability, profiling, and diagnostics across the agent stack, from backend systems to inference, GPUs, and fleet capacity.
Work closely with research to make the harness trainable, measurable, and useful for improving frontier agentic models.
Build shared primitives that make Codex faster, safer, more reliable, and easier for other teams and open-source users to build on.

Requirements

Experience building or operating production systems in distributed systems, infrastructure, developer tooling, sandboxing, virtualization, cloud platforms, or ML systems.
Comfortable working across layers: Rust systems code, Python configuration layers, APIs, agent orchestration, evals, logs/traces, inference behavior, runtime constraints, and user outcomes.
Hands-on experience with LLM applications, coding agents, evals, model deployment, inference, compiler/runtime performance, or developer platforms.
Strong focus on reliability, safety, performance, debuggability, and clean abstractions.
Ability to debug from evidence and move quickly from ambiguous production failures to practical, durable fixes.
Willingness to work close to research while shipping production changes and to write meaningful code, show strong ownership, and lead scoped or multi-team AI systems work.

Bonus Points

Deep Rust, systems, sandboxing, isolation, or low-level platform experience.
Experience with coding agents, agent harnesses, tool-using LLM systems, model evals, or post-training feedback loops.
Background in compilers, kernels, runtimes, inference optimization, GPU systems, benchmarking, profiling, or performance engineering.
Experience building production infrastructure used by many engineers or users under demanding reliability and security constraints.
Open-source infrastructure or developer-platform work with strong taste for APIs and usability.

Benefits

Medical, dental, and vision insurance with employer contributions to Health Savings Accounts.
Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses.
401(k) retirement plan with employer match.
Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks).
Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees.
13+ paid company holidays and multiple company office closures, plus paid sick or safe time as required by law.
Mental health and wellness support; employer-paid basic life and disability coverage.
Annual learning and development stipend.
Daily meals in offices and meal delivery credits as eligible.
Relocation support for eligible employees.
Additional taxable fringe benefits (charitable donation matching, wellness stipends).

About OpenAI

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. The company emphasizes safety, diverse perspectives, and equal employment opportunity. Background checks and reasonable accommodations are administered consistent with applicable law.