Unfortunately the paper doesn’t include gpt 5.3 which was released around the same time as opus 4.6 ...

gizmodo59 • yesterday at 12:53 PM • 2 replies • view on HN

Unfortunately the paper doesn’t include gpt 5.3 which was released around the same time as opus 4.6 and also gpt 5.4 few days back. Both are available via api

https://developers.openai.com/api/docs/models/gpt-5.3-codex

IMHO The harness must be used when running these experiments. The model vendors know best on giving the best harness with gpt 5.4 and codex or Claude code with opus 4.6 which makes a big difference if you are running any kind of agentic coding tasks.

I see both Claude and gpt to be neck and neck in coding. Every other model+harness is definitely 3-6 months behind. Right now codex seems to be the best in terms of solving complex bugs, long running tasks, much higher limits and even speed while Claude seems to do well in front end and their cli ux seems nice! Codex app is very good though (wish it wasn’t electron as a memory hog but it’s good)

Replies

jasonjmcghee • yesterday at 3:51 PM

> model vendors know best on giving the best harness

This was only true for Claude Code for a while. Codex was poor and Gemini was unusable.

Since then Codex has gotten quite good.

➕ show 1 reply

p1esk • yesterday at 1:07 PM

Are you saying they did not use native harnesses like Claude Code or Codex? How did they do it then?

alt Hacker News

Replies