logoalt Hacker News

simianwordsyesterday at 6:34 PM6 repliesview on HN

Why would some one use codex instead?


Replies

lmeyerovyesterday at 8:39 PM

In our evals for answering cybersecurity incident investigation questions and even autonomously doing the full investigation, gpt-5.2-codex with low reasoning was the clear winner over non-codex or higher reasoning. 2X+ faster, higher completion rates, etc.

It was generally smarter than pre-5.2 so strategically better, and codex likewise wrote better database queries than non-codex, and as it needs to iteratively hunt down the answer, didn't run out the clock by drowning in reasoning.

Video: https://media.ccc.de/v/39c3-breaking-bots-cheating-at-blue-t...

We'll be updating numbers on 5.3 and claude, but basically same thing there. Early, but we were surprised to see codex outperform opus here.

jeswinyesterday at 7:10 PM

When it comes to lengthy non-trivial work, codex is much better but also slower.

surgical_fireyesterday at 6:48 PM

I've been using Codex for software development personally (I have a ChatGPT account), and I use Claude at work (since it is provided by my employer).

I find both Codex and Claude Opus perform at a similar level, and in some ways I actually prefer Codex (I keep hitting quota limits in Opus and have to revert back to Sonnet).

If your question is related to morality (the thing about US politics, DoD contract and so on)... I am not from the US, and I don't care about its internal politics. I also think both OpenAI and Anthropic are evil, and the world would be better if neither existed.

show 3 replies
synergy20yesterday at 9:02 PM

in my testing codex actually planned worse than claude but coded better once the plan is set, and faster. it is also excellent to cross check claude's work, always finding great weakness each time.

show 1 reply
embedding-shapeyesterday at 6:40 PM

Why would someone use Claude Code instead? Or any other harness? Or why only use one?

My own tooling throws off requests to multiple agents at the same time, then I compare which one is best, and continue from there. Most of the time Codex ends up with the best end results though, but my hunch is that at one point that'll change, hence I continue using multiple at the same time.