Thoughts on Codex: GPT-5's Engineering Teammate

When OpenAI announced GPT-5-Codex two days ago, my first reaction was cynical. Another incremental model update, another point release wrapped in marketing. But after spending an intensive 48 hours with it, I can tell you my cynicism was misplaced. This isn’t just a better model for writing code. It’s a fundamentally different kind of tool.

We’ve had AI that can code for a while. What we haven’t had, until now, is an AI that can engineer.

The announcement claimed GPT-5-Codex was ‘trained on complex, real-world engineering tasks.’ I believe this is the secret. It’s a subtle but profound distinction. Previous models were trained to predict the next token in a vast corpus of code. GPT-5-Codex feels like it was trained on pull requests, design docs, and terminal sessions. It thinks in workflows, not just snippets. It has an innate understanding of the process of building software.

The Workflow is the Model

I’ve written before about Agentic Coding being a new engineering primitive. GPT-5-Codex is the first purpose-built tool for that primitive. Where GPT-4 or even the base GPT-5 felt like a brilliant but stateless consultant you could call on, Codex feels like a persistent pair programmer that remembers what you’re trying to build.

My ‘aha’ moment came when I tested GPT-5-Codex (medium) on a complex coding problem that involved breaking up several 1000+ LOC files into smaller ones, but the logic required also creating new modules and splitting dependencies across several files using Go language. The agentic model was reasoning on my problem for 15-30 minutes and was able to two-shot complete the entire process.

Instead of prompting it with specific file changes, I opened the new Codex CLI, attached a simple architectural diagram, and gave it one instruction: ‘Refactor the authentication module to match this new flow, ensuring all tests pass.’

It didn’t just spit out code. It created a to-do list, which it ticked off as it went. It ran tests, found a failure, and then debugged its own code before trying again. The announcement mentioned it could work for over 7 hours; while I didn’t test that limit, I watched it autonomously work through this 45-minute task, iterating and self-correcting in a way that felt less like a language model and more like a junior developer.

This is the dividend of its training data. It has learned the cadence of development: plan, act, observe, and refine.

The Catch-22 of Competence and Safety

This new, workflow-centric approach has an interesting side effect: it feels inherently safer. The model’s fine-tuning on engineering tasks—which involve seeking permissions, checking outputs, and using tools methodically—has baked in a level of caution.

When dealing with my local file system, it was far less presumptive than I expected. The new approval modes in the CLI are excellent, but the model itself seems more aware of its boundaries. It proposes a diff before applying it. It explains why it needs to run a shell command.

This is safety through competence, not just a layer of RLHF guardrails. Because it understands the engineering workflow, it understands that destructive actions are rarely the right next step. This methodical nature makes it far more trustworthy. It solves the catch-22 of agentic systems: to be useful, an agent needs powerful tools, but powerful tools are dangerous. By training the agent on the culture and process of safely using those tools, OpenAI has made it both more capable and more reliable.

Agentic Models: The Evolution of LLMs

I do think that agentic models or agentic-friendly LLMs are evolution of traditional LLMs or traditional models as they unlock new emerging capabilities like advanced tool use or function calling. GPT-5-Codex represents this evolution in action. It’s not just a bigger, better language model—it’s a model that has internalized the engineering workflow itself.

Traditional LLMs excel at generating text and code snippets, but they remain fundamentally reactive: you prompt them, they respond. Agentic models like Codex add a layer of autonomy and workflow understanding. They can plan multi-step tasks, execute them sequentially, and adapt based on intermediate results. This isn’t just incremental improvement; it’s a qualitative shift that unlocks entirely new use cases.

The key insight is that agentic capabilities emerge from training on processes, not just content. By exposing models to real engineering workflows—complete with planning, execution, testing, and iteration—OpenAI has created something that doesn’t just know how to write code, but knows how to build software.

The Agent is Finally Home

For years, our most powerful AI models have been trapped behind chat interfaces. The revamped Codex CLI and new IDE extension finally bring the agent into the developer’s native environment. This is critical. The friction of context-switching to a browser tab is gone. The agent now lives in the terminal and the editor, with direct access to the files and context it needs.

This is the fulfillment of the promise of the agentic experience. The system is no longer just a clever tool; it’s a teammate, integrated directly into the workspace.

Pushing the Boundaries

My next goal is to test longer durations including hours. The announcement claims GPT-5-Codex can work for over 7 hours autonomously, and I want to see how it performs on even more complex engineering challenges that require sustained focus and multi-step problem solving.

The New Baseline

After two days of use, my conclusion is this: GPT-5-Codex (medium or higher) establishes a new baseline for what a coding agent should be. The generic, all-purpose model is still incredibly useful, but for the focused domain of software engineering, specialization is a game-changer.

The core shift is from commanding a model to collaborating with a teammate. It changes your role from a prompt engineer to a systems architect who guides the agent. The game is no longer about crafting the perfect prompt to generate a perfect function. It’s about defining a clear specification, curating the right context, and letting your agentic teammate handle the implementation. The age of the AI engineer is here.