Applied AI Engineer, Codex Core Agent
📍 New York City, United States
📍 San Francisco, United States
📍 Seattle, United States
Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Python @ 6
Machine Learning @ 3
Debugging @ 3
Experimentation @ 3
LLM @ 3
Codex @ 3
AI @ 3
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
The Codex Core Agent team builds the kernel of Codex. They focus on improving agent performance, accelerating research, and making those improvements real in production: performance around tokens, latency, reliability, cost, capacity; the core execution loop and interfaces that turn models into useful behavior; shared infrastructure for other teams; and feedback loops that turn real-world usage into better models and agent behavior over time.
This role is about bringing Codex agents from impressive demos to dependable tools by improving agent performance on real software engineering tasks and closing the gap between research capability and real-world usefulness. You will collaborate with research, infrastructure, and product teams to ensure agents are useful, steerable, and reliable, and to turn model and systems improvements into measurable gains in solve rate, usefulness, and economic value.
Responsibilities
- Design and iterate on agent behaviors across real-world coding tasks and long-horizon workflows.
- Work closely with research to develop and run evaluations to measure agent performance, regressions, failure modes, and edge cases.
- Improve performance through prompting, tool-use strategies, context construction, and model-facing experimentation.
- Analyze failures in production and systematically improve robustness and reliability.
- Build feedback loops and data systems that get better real-task data into evaluation and research.
- Work with product teams to shape user-facing agent experiences and the interfaces the agent depends on.
- Help define what “good” looks like for agents completing complex tasks end-to-end.
Requirements
- Experience building or shipping machine learning or LLM-powered products.
- Strong Python skills and comfort with modern ML tooling.
- Experience with model evaluation, fine-tuning, or prompt design.
- Ability to think in terms of systems and user outcomes, not just model metrics.
- Enjoy debugging messy, real-world failures and turning them into improvements.
- Interest in turning research and model potential into systems that actually work for users.
Bonus (optional)
- Experience with agent frameworks or tool-using LLM systems.
- Research experience with code generation models or developer tooling.
- Experience working with large, messy datasets or production logs.
Benefits
- Medical, dental, and vision insurance with employer contributions to Health Savings Accounts.
- Pre-tax accounts: Health FSA, Dependent Care FSA, and commuter expense accounts.
- 401(k) retirement plan with employer match.
- Paid parental leave (up to 24 weeks for birth parents, 20 weeks for non-birthing parents); paid medical and caregiver leave.
- Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees.
- 13+ paid company holidays and multiple coordinated office closures; paid sick or safe time as required by law.
- Mental health and wellness support; employer-paid basic life and disability coverage.
- Annual learning and development stipend; daily meals in offices and meal delivery credits as eligible.
- Relocation support for eligible employees.
- Additional taxable fringe benefits such as charitable donation matching and wellness stipends.
About OpenAI
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. The company emphasizes safety, diverse perspectives, and inclusion, and is an equal opportunity employer. Background checks are administered in accordance with applicable law. Reasonable accommodations for applicants with disabilities are available via the provided links in the original posting.