If anyone from OpenAI is reading this -- a plea to not screw with the reasoning capabilities! Code...

mccoyb • last Thursday at 6:31 PM • 14 replies • view on HN

If anyone from OpenAI is reading this -- a plea to not screw with the reasoning capabilities!

Codex is so so good at finding bugs and little inconsistencies, it's astounding to me. Where Claude Code is good at "raw coding", Codex/GPT5.x are unbeatable in terms of careful, methodical finding of "problems" (be it in code, or in math).

Yes, it takes longer (quality, not speed please!) -- but the things that it finds consistently astound me.

Replies

sinatra • yesterday at 12:59 AM

Piggybacking on this post. Codex is not only finding much higher quality issues, it’s also writing code that usually doesn’t leave quality issues behind. Claude is much faster but it definitely leaves serious quality issues behind.

So much so that now I rely completely on Codex for code reviews and actual coding. I will pick higher quality over speed every day. Please don’t change it, OpenAI team!

➕ show 2 replies

ifwinterco • last Thursday at 6:42 PM

I think the issue is for them "quality, not speed" means "expensive, not cheap" and they can't pass that extra cost on to customers

➕ show 4 replies

stared • yesterday at 10:31 AM

If you want to combine Claude Code coding with reasoning, it is easy to do it with a plugin - https://github.com/stared/gemini-claude-skills, wrote for myself, but shared in case anyone wants. Somehow bigger context here: https://quesma.com/blog/claude-skills-not-antigravity/.

energy123 • yesterday at 5:48 AM

Second this but for the chat subscription. Whatever they did with 5.2 compared to 5.0 in ChatGPT increased the test-time compute and the quality shows. If only they would allow more tokens to be submitted in one prompt (it's currently capped at 46k for Plus). I don't touch Gemini 3.0 Pro now (am also subbed there) unless I need the context length.

baseonmars • last Thursday at 9:49 PM

absolutely second this. I'm mainly a claude code user, but i have codex running in another tab and for code reviews and it's absolutely killer at analyzing flows and finding subtle bugs.

➕ show 1 reply

smoe • yesterday at 12:26 AM

Do you think that for someone who only needs careful, methodical identification of “problems” occasionally, like a couple of times per day, the $20/month plan gets you anywhere, or do you need the $200 plan just to get access to this?

➕ show 3 replies

apitman • last Thursday at 7:02 PM

It's annoying though because it keeps (accurately) pointing out critical memory bugs that I clearly need to fix rather than pretending they aren't there. It's slowing me down.

➕ show 1 reply

rane • yesterday at 7:45 AM

Exactly. This is why the workflow of consulting Gemini/Codex for architecture and overall plan, and then have Claude implement the changes is so powerful.

jvermillard • yesterday at 6:37 AM

I use it mainly for embedded programming and I find codex way better than claude. I don't my the delay anyway I'm slower to code carefully crafted C

tgtweak • last Thursday at 6:39 PM

Anecdotally I've found it very good in the exact same case for multi-agent workflows - as the "reviewer"

kilroy123 • last Thursday at 6:50 PM

Interesting what I've seen is it spins and thinks forever. Then just breaks. Which is beyond frustrating.

➕ show 2 replies

johnnyfived • yesterday at 12:13 AM

Agreed, I'm surprised how much much care the "extra high" reasoning allows. It easily catches bugs in code other LLMs won't, using it to review Opus 4.5 is highly effective.

garbagecoder • last Thursday at 11:08 PM

Agree. Codex just read my source code for a toy lisp I wrote in ARM64 assembly and learned how to code in that lisp and wrote a few demo programs for me. The was impressive enough. Then it spent some time and effort to really hunt down some problems--there was a single bit mask error in my garbage collector that wasn't showing up until then. I was blown away. It's the kind of thing I would have spent forever trying to figure out before.

➕ show 3 replies

echelon • last Thursday at 8:09 PM

> If anyone from OpenAI is reading this

(unrelated, but piggybacking on requests to reach the teams)

If anyone from OpenAI or Google is reading this, please continue to make your image editing models work with the "previz-to-render" workflow.

Image edits should strongly infer pose and blocking as an internal ControlNet, but should be able to upscale low-fidelity mannequins, cutouts, and plates/billboards.

OpenAI kicks ass at this (but could do better with style controls - if I give a Midjourney style ref, use it) :

https://imgur.com/gallery/previz-to-image-gpt-image-1-x8t1ij...

https://imgur.com/a/previz-to-image-gpt-image-1-5-3fq042U

Google fails the tests currently, but can probably easily catch up :

https://imgur.com/a/previz-to-image-nano-banana-pro-Q2B8psd

alt Hacker News

Replies