One thing that surprised me when diving into the Codex internals was that the reasoning tokens persist during the agent tool call loop, but are discarded after every user turn.
This helps preserve context over many turns, but it can also mean some context is lost between two related user turns.
A strategy that's helped me here, is having the model write progress updates (along with general plans/specs/debug/etc.) to markdown files, acting as a sort of "snapshot" that works across many context windows.
I made a skill that reflects on past conversations via parallel headless codex sessions. Its great for context building. Repo: https://github.com/olliepro/Codex-Reflect-Skill
This is effective and it's convenient to have all that stuff co-located with the code, but I've found it causes problems in team environments or really anywhere where you want to be able to work on multiple branches concurrently. I haven't come up with a good answer yet but I think my next experiment is to offload that stuff to a daemon with external storage, and then have a CLI client that the agent (or a human) can drive to talk to it.
where do you save the progress updates in? and do you delete them afterwards or do you have like 100+ progress updates each time you have claude or codex implement a feature or change?
I’ve been using agent-shell in emacs a lot and it stores transcripts of the entire interaction. It’s helped me out lot of times because I can say ‘look at the last transcript here’.
It’s not the responsibility of the agent to write this transcript, it’s emacs, so I don’t have to worry about the agent forgetting to log something. It’s just writing the buffer to disk.
Same here! I think it would be good if this could be made by default by the tooling. I've seen others using SQL for the same and even the proposal for a succinct way of representing this handoff data in the most compact way.
That could explain the "churn" when it gets stuck. Do you think it needs to maintain an internal state over time to keep track of longer threads, or are written notes enough to bridge the gap?
I don't think this is true.
I'm pretty sure that Codex uses reasoning.encrypted_content=true and store=false with the responses API.
reasoning.encrypted_content=true - The server will return all the reasoning tokens in an encrypted blob you can pass along in the next call. Only OpenaAI can decrypt them.
store=false - The server will not persist anything about the conversation on the server. Any subsequent calls must provide all context.
Combined the two above options turns the responses API into a stateless one. Without these options it will still persist reasoning tokens in a agentic loop, but it will be done statefully without the client passing the reasoning along each time.
but that's why I like Codex CLI, it's so bare bone and lightweight that I can build lots tools on top of it. persistent thinking tokens? let me have that using a separate file the AI writes to. the reasoning tokens we see aren't the actual tokens anyway; the model does a lot more behind the scenes but the API keeps them hidden (all providers do that).
I think this explains why I'm not getting the most out of codex, I like to interrupt and respond to things i see in reasoning tokens.
It depends on the API path. Chat completions does what you describe, however isn't it legacy?
I've only used codex with the responses v1 API and there it's the complete opposite. Already generated reasoning tokens even persist when you send another message (without rolling back) after cancelling turns before they have finished the thought process
Also with responses v1 xhigh mode eats through the context window multiples faster than the other modes, which does check out with this.