Unrolling the Codex agent loop

221 points • by tosh • yesterday at 8:42 PM • 103 comments • view on HN

Comments

The best part about this blog post is that none of it is a surprise – Codex CLI is open source. It's nice to be able to go through the internals without having to reverse engineer it.

Their communication is exceptional, too. Eric Traut (of Pyright fame) is all over the issues and PRs.

https://github.com/openai/codex

➕ show 3 replies

westoncb • yesterday at 11:06 PM

Interesting that compaction is done using an encrypted message that "preserves the model's latent understanding of the original conversation":

> Since then, the Responses API has evolved to support a special /responses/compact endpoint (opens in a new window) that performs compaction more efficiently. It returns a list of items (opens in a new window) that can be used in place of the previous input to continue the conversation while freeing up the context window. This list includes a special type=compaction item with an opaque encrypted_content item that preserves the model’s latent understanding of the original conversation. Now, Codex automatically uses this endpoint to compact the conversation when the auto_compact_limit (opens in a new window) is exceeded.

jumploops • yesterday at 9:07 PM

One thing that surprised me when diving into the Codex internals was that the reasoning tokens persist during the agent tool call loop, but are discarded after every user turn.

This helps preserve context over many turns, but it can also mean some context is lost between two related user turns.

A strategy that's helped me here, is having the model write progress updates (along with general plans/specs/debug/etc.) to markdown files, acting as a sort of "snapshot" that works across many context windows.

➕ show 9 replies

daxfohl • today at 12:59 AM

I like it but wonder why it seems so slow compared to the chatgpt web interface. I still find myself more productive copying and pasting from chat much of the time. You get virtually instant feedback, and it feels far more natural when you're tossing around ideas, seeing what different approaches look like, trying to understand the details, etc. Going back to codex feels like you're waiting a lot longer for it to do the wrong thing anyway, so the feedback cycle is way slower and more frustrating. Specifically I hate when I ask a question, and it goes and starts editing code, which is pretty frequent. That said, it's great when it works. I just hope that someday it'll be as easy and snappy to chat with as the web interface, but still able to perform local tasks.

coffeeaddict1 • yesterday at 10:12 PM

What I really want from Codex is checkpoints ala Copilot. There are a couple of issues [0][1] opened about on GitHub, but it doesn't seem a priority for the team.

[0] https://github.com/openai/codex/issues/2788

[1] https://github.com/openai/codex/issues/3585

➕ show 2 replies

SafeDusk • today at 1:22 AM

These can also be observed through OTEL telemetries.

I use headless codex exec a lot, but struggles with its built-in telemetry support, which is insufficient for debugging and optimization.

Thus I made codex-plus (https://github.com/aperoc/codex-plus) for myself which provides a CLI entry point that mirrors the codex exec interface but is implemented on top of the TypeScript SDK (@openai/codex-sdk).

It exports the full session log to a remote OpenTelemetry collector after each run which can then be debugged and optimized through codex-plus-log-viewer.

mkw5053 • yesterday at 9:00 PM

I guess nothing super surprising or new but still valuable read. I wish it was easier/native to reflect on the loop and/or histories while using agentic coding CLIs. I've found some success with an MCP that let's me query my chat histories, but I have to be very explicit about it's use. Also, like many things, continuous learning would probably solve this.

rvnx • yesterday at 10:02 PM

Codex agent loop:

    Call the model. If it asks for a tool, run the tool and call again (with the new result appended). Otherwise, done

https://i.ytimg.com/vi/74U04h9hQ_s/maxresdefault.jpg

➕ show 3 replies

tecoholic • yesterday at 10:19 PM

I use 2 cli - Codex and Amp. Almost every time I need a quick change, Amp finishes the task in the time it takes Codex to build context. I think it’s got a lot to do with the system prompt and a the “read loop” as well, amp would read multiple files in one go and get to the task, but codex would crawl the files almost one by one. Anyone noticed this?

➕ show 1 reply

written-beyond • yesterday at 9:38 PM

Has anyone seriously used codex cli? I was using LLMs for code gen usually through the vscode codex extension, Gemini cli and Claude Code cli. The performance of all 3 of them is utter dog shit, Gemini cli just randomly breaks and starts spamming content trying to reorient itself after a while.

However, I decided to try codex cli after hearing they rebuilt it from the ground up and used rust(instead of JS, not implying Rust==better). It's performance is quite literally insane, its UX is completely seamless. They even added small nice to haves like ctrl+left/right to skip your cursor to word boundaries.

If you haven't I genuinely think you should give it a try you'll be very surprised. Saw Theo(yc ping labs) talk about how open ai shouldn't have wasted their time optimizing the cli and made a better model or something. I highly disagree after using it.

➕ show 7 replies

dfajgljsldkjag • yesterday at 9:20 PM

The best part about this is how the program acts like a human who is learning by doing. It is not trying to be perfect on the first try, it is just trying to make progress by looking at the results. I think this method is going to make computers much more helpful because they can now handle the messy parts of solving a problem.

mohsen1 • yesterday at 11:14 PM

Tool call during thinking is something similar to this I am guessing. Deepseek has a paper on this.

Or am I not understanding this right?

I_am_tiberius • yesterday at 11:52 PM

Pity it doesn't support other llms.

MultifokalHirn • yesterday at 9:06 PM

thx :)

ppeetteerr • yesterday at 9:57 PM

I asked Claude to summarize the article and it was blocked haha. Fortunately, I have the Claude plugin in chrome installed and it used the plugin to read the contents of the page.

➕ show 1 reply

alt Hacker News

Unrolling the Codex agent loop

Comments