logoalt Hacker News

mccoyblast Thursday at 6:31 PM14 repliesview on HN

If anyone from OpenAI is reading this -- a plea to not screw with the reasoning capabilities!

Codex is so so good at finding bugs and little inconsistencies, it's astounding to me. Where Claude Code is good at "raw coding", Codex/GPT5.x are unbeatable in terms of careful, methodical finding of "problems" (be it in code, or in math).

Yes, it takes longer (quality, not speed please!) -- but the things that it finds consistently astound me.


Replies

sinatrayesterday at 12:59 AM

Piggybacking on this post. Codex is not only finding much higher quality issues, it’s also writing code that usually doesn’t leave quality issues behind. Claude is much faster but it definitely leaves serious quality issues behind.

So much so that now I rely completely on Codex for code reviews and actual coding. I will pick higher quality over speed every day. Please don’t change it, OpenAI team!

show 2 replies
ifwintercolast Thursday at 6:42 PM

I think the issue is for them "quality, not speed" means "expensive, not cheap" and they can't pass that extra cost on to customers

show 4 replies
staredyesterday at 10:31 AM

If you want to combine Claude Code coding with reasoning, it is easy to do it with a plugin - https://github.com/stared/gemini-claude-skills, wrote for myself, but shared in case anyone wants. Somehow bigger context here: https://quesma.com/blog/claude-skills-not-antigravity/.

energy123yesterday at 5:48 AM

Second this but for the chat subscription. Whatever they did with 5.2 compared to 5.0 in ChatGPT increased the test-time compute and the quality shows. If only they would allow more tokens to be submitted in one prompt (it's currently capped at 46k for Plus). I don't touch Gemini 3.0 Pro now (am also subbed there) unless I need the context length.

baseonmarslast Thursday at 9:49 PM

absolutely second this. I'm mainly a claude code user, but i have codex running in another tab and for code reviews and it's absolutely killer at analyzing flows and finding subtle bugs.

show 1 reply
smoeyesterday at 12:26 AM

Do you think that for someone who only needs careful, methodical identification of “problems” occasionally, like a couple of times per day, the $20/month plan gets you anywhere, or do you need the $200 plan just to get access to this?

show 3 replies
apitmanlast Thursday at 7:02 PM

It's annoying though because it keeps (accurately) pointing out critical memory bugs that I clearly need to fix rather than pretending they aren't there. It's slowing me down.

show 1 reply
raneyesterday at 7:45 AM

Exactly. This is why the workflow of consulting Gemini/Codex for architecture and overall plan, and then have Claude implement the changes is so powerful.

jvermillardyesterday at 6:37 AM

I use it mainly for embedded programming and I find codex way better than claude. I don't my the delay anyway I'm slower to code carefully crafted C

tgtweaklast Thursday at 6:39 PM

Anecdotally I've found it very good in the exact same case for multi-agent workflows - as the "reviewer"

kilroy123last Thursday at 6:50 PM

Interesting what I've seen is it spins and thinks forever. Then just breaks. Which is beyond frustrating.

show 2 replies
johnnyfivedyesterday at 12:13 AM

Agreed, I'm surprised how much much care the "extra high" reasoning allows. It easily catches bugs in code other LLMs won't, using it to review Opus 4.5 is highly effective.

garbagecoderlast Thursday at 11:08 PM

Agree. Codex just read my source code for a toy lisp I wrote in ARM64 assembly and learned how to code in that lisp and wrote a few demo programs for me. The was impressive enough. Then it spent some time and effort to really hunt down some problems--there was a single bit mask error in my garbage collector that wasn't showing up until then. I was blown away. It's the kind of thing I would have spent forever trying to figure out before.

show 3 replies
echelonlast Thursday at 8:09 PM

> If anyone from OpenAI is reading this

(unrelated, but piggybacking on requests to reach the teams)

If anyone from OpenAI or Google is reading this, please continue to make your image editing models work with the "previz-to-render" workflow.

Image edits should strongly infer pose and blocking as an internal ControlNet, but should be able to upscale low-fidelity mannequins, cutouts, and plates/billboards.

OpenAI kicks ass at this (but could do better with style controls - if I give a Midjourney style ref, use it) :

https://imgur.com/gallery/previz-to-image-gpt-image-1-x8t1ij...

https://imgur.com/a/previz-to-image-gpt-image-1-5-3fq042U

Google fails the tests currently, but can probably easily catch up :

https://imgur.com/a/previz-to-image-nano-banana-pro-Q2B8psd