codex gtp-5.5 is far superior to opus 4.7 working on large projects

grodes • today at 2:01 PM • 11 replies • view on HN

Replies

I strongly believe the reason gpt-5.x performs so well on large projects is because of the focused training they've done on their dedicated apply_patch primitive.

The official implementation of apply_patch is well thought out. It is a two-phase process that will not actually make any changes until all files in the change set are not ambiguous. The pre-commit error feedback usually fixes anchoring issues with one or two additional attempts. It generally goes something like:

  Reading file A L1:154
  Reading file B L1:123
  Attempting to apply patch... 
  [anchor errors for both A & B]
  Reading file A L43:67
  Reading file B L50:74
  Attempting to apply patch... 
  Patch succeeded! Running compilation & unit tests...

The anchor error feedback helps massively because in this implementation it also returns the current line numbers where the problem was found.

Techniques that replace the whole file or depend on find-replace are useful in more isolated contexts. However, when you need to refactor 20+ files, something like apply_patch is what you want. Anything that depends on specific line numbers for actual replacement targets is a total dead end for complex edit scenarios.

https://developers.openai.com/api/docs/guides/tools-apply-pa...

lucamark • today at 2:31 PM

I'm experiencing the same. Codex gtp-5.5 has more brilliant intuitions, write less code, i.e. it identifies the exact point in which the modification shall be done. Nevertheless, huge improvements on personality from opus 4.7 (it was too accomodating) to opus 4.8

meowface • today at 2:06 PM

GPT-5.5 is the better programmer but Opus 4.8 remains the better system architect and product designer.

Codex is very "miss the forest for the trees", but is much better at successfully making large changes in large codebases. Claude Code makes more mistakes, but has more taste and a better grasp on idiomatic and elegant software development.

If you can afford to, I recommend juggling both.

➕ show 2 replies

vb-8448 • today at 3:11 PM

My problem with codex/gpt that is too verbose (mostly js and python): a lot of helper functions, a lot of 1 or 2 line functions used in 1 place only, a lot of types or proxy like objects.

I have specific skills for trying to avoid this, but nevertheless I spent half of the time fighting with its verbosity.

Currently, I'm trying to scaffold the functions/classes I know I need with NotImpelmented and ask it to implement only inside those specific places. It's a little bit better, but I still have to fight with function in functions definitions ...

RA_Fisher • today at 2:08 PM

In what ways? LM Arena has Opus 4.7 w/ 1567 -/+ 7 vs. 1505 -/+ 10 from GPT-5.5 Codex in code. I'm currently using both.

Admittedly my recent experience tilts Opus now 4.8, but you and others have my interest piqued re: GPT-5.5 Codex so I'm trying that more now.

➕ show 1 reply

the__alchemist • today at 2:16 PM

You're using last week's model; Opus 4.7 is old news. Opus 6.9 is the new hotness; it is a better product manager than GPT, and has more X productivity. It replaced our junior dev team, and tells me my hair looks good.

➕ show 1 reply

dangus • today at 2:12 PM

Opus 4.7 is not the current version of Opus.

BoredPositron • today at 2:03 PM

Not everyone is a developer...

➕ show 2 replies

sergiotapia • today at 4:06 PM

My experience as well. Although this week I've moved to Cursor and Composer 2.5. It's so fast that any faults can be iterated on super quickly. The model is just insanely good with code things.

Keyframe • today at 4:41 PM

source?

oofbey • today at 2:13 PM

GPT 5.5 still invents facts rather than looking them up, and manages to come across both as condescending and sycophantic. It feels like talking to a used car salesman.

➕ show 2 replies

alt Hacker News

Replies