I was very skeptical about Codex at the beginning, but now all my coding tasks start with Codex. It's not perfect at everything, but overall it's pretty amazing. Refactoring, building something new, building something I'm not familiar with. It is still not great at debugging things.
One surprising thing that codex helped with is procrastination. I'm sure many people had this feeling when you have some big task and you don't quite know where to start. Just send it to Codex. It might not get it right, but it's almost always good starting point that you can quickly iterate on.
> It is still not great at debugging things.
It's so fascinating to me that the thread above this one on this page says the opposite, and the funniest thing is I'm sure you're both right. What a wild world we live in, I'm not sure how one is supposed to objectively analyse the performance of these things
I always wonder how people make qualitative statements like this. There are so many variables! Is it my prompt? The task? The specific model version? A good or bad branch out of the non-deterministic solution space?
Like, do you run a proper experiment where you hand the same task to multiple models several times and compare the results? Not snark by the way, I’m asking in earnest how you pick one model over another.
Same actually. Though, for some reasons Codex utterly falls down with podman, especially rootless podman. No matter how many explicit instructions I give it in the prompt and AGENTS.md, it will try to set a ton of variables and break podman. It will then try use docker (again despite explicit instructions not too) and eventually will try to sudo podman. One time I actually let it, and it reused its sudo perms to reconfigure selinux on my system, which completely broke it so that I could no longer get root on my own machine and the machine never booted again (because selinux was blocking everything). It has tried to do the same thing three times now on different projects.
So yeah, I use codex a lot and like it, but it has some really bad blind spots.
> One surprising thing that codex helped with is procrastination.
Heh. It's about the same as an efficient compilation or integration testing process that is long enough to let it do it's thing while you go and browse Hacker News.
IMHO, making feedback loops faster is going to be key to improving success rates with agentic coding tools. They work best if the feedback loop is fast and thorough. So compilers, good tests, etc. are important. But it's also important that that all runs quickly. It's almost an even split between reasoning and tool invocations for me. And it is rather trigger happy with the tool invocations. Wasting a lot of time to find out that a naive approach was indeed naive before fixing it in several iterations. Good instructions help (Agents.md).
Focusing attention on just making builds fast and solid is a good investment in any case. Doubly so if you plan on using agentic coding tools.
This works for me in general. If I am procrastinating, I ask a coding agent for a small task. If it works, I have something to improve upon. If it doesn’t work, my OCD forces me to “fix it.” :D
I think Opus + Claude Code is the more competent overall general "making things" system, while it makes sense to have a $20 Codex subscription to find bugs and review the things that Claude Code makes.
On its own, as sole author, I find Codex overcomplicates things. It will riddle your code with unnecessary helper functions and objects and pointless abstractions.
It is however useful for doing a once over for code review and finding the things that Claude rushed through.
> One surprising thing that codex helped with is procrastination.
The Roomba effect is real. The AI models do all the heavy implementation work, and when it asks me to setup an execute tests, I feel obliged to get to it ASAP.
I have similar experiences with Claude Code ;) Have you used it as well? How does it compare?
Infinitely agree with all. I was skeptical, and then tried Opus 4.5 and was blown away. Codex with 5.0 and 5.1 wasn't great, but 5.2 is big improvement. I can't do code without it because there's no point. Time and quality with the right constraints, you're going to get better code.
And same thought with both procrastination because of not knowing where to start, but also getting stuck in the middle and not knowing where to go. Literally never happens anymore. Having discussions with it for doing the planning and different options for implementations, and you get to the end with a good design description and then, what's the point of writing the code yourself when with that design, it's going to write it quickly and matching the agreements.