GPT-5.5

1069 points • by rd • yesterday at 6:01 PM • 723 comments • view on HN

Comments

aetherspawn • yesterday at 10:26 PM

Umm yeah but this is like every release in the last 3 years.

The big question is: does it still just write slop, or not?

Fool me once, fool me twice, fool me for the 32nd time, it’s probably still just slop.

nullbyte • yesterday at 6:09 PM

82.7% on Terminal Bench is crazy

➕ show 1 reply

impulser_ • yesterday at 6:15 PM

What is the reason behind OpenAI being able to release new models very fast?

Since Feb when we got Gemini 3.1, Opus 4.6, and GPT-5.3-Codex we have seen GPT-5.4 and GPT-5.5 but only Opus 4.7 and no new Gemini model.

Both of these are pretty decent improvements.

➕ show 4 replies

bradley13 • yesterday at 7:46 PM

"our strongest set of safeguards to date"

How much capability is lost, by hobbling models with a zillion protections against idiots?

Every prompt gets evaluated, to ensure you are not a hacker, you are not suicidal, you are not a racist, you are not...

Maybe just...leave that all off? I know, I know, individual responsibility no longer exists, but I can dream.

➕ show 1 reply

YmiYugy • yesterday at 6:14 PM

So according to the benchmarks somewhere in between Opus 4.7 and Mythos

➕ show 1 reply

adam12 • yesterday at 9:57 PM

"Sometime with GPT-5.5 I become lazy"

I don't want to be lazy.

aussieguy1234 • today at 12:28 AM

If SWE-Bench Verified is no longer a good measure of agentic coding abilities, what benchmark now is?

ionwake • yesterday at 6:30 PM

is there anywhere I can try it? ( I just stopped my pro sub ) but was wondering if there is a playground or 3rd party so i can just test it briefly?

AbuAssar • yesterday at 7:46 PM

This is the first time openAi include competing models in their benchmarks, always included only openAi models.

zerotosixty • yesterday at 8:09 PM

Those who are using gpt5.5 how does it compare to Opus 4.6 / 4.7 in terms of code generation?

k2xl • yesterday at 6:23 PM

Surprised to see SWE-Bench Pro only a slight improvement (57.7% -> 58.6%) while Opus 4.7 hit 64.3%. I wonder what Anthropic is doing to achieve higher scores on this - and also what makes this test particular hard to do well in compared to Terminal Bench (which 5.5 seemed to have a big jump in)

➕ show 2 replies

deaux • yesterday at 10:41 PM

ctrl+f "cutoff, 0 results"

Surely it doesn't still have the same ancient data cutoff as 5.4 did?

Schlagbohrer • yesterday at 9:50 PM

entering this comments area wondering if it will be full of complaints about the new personality, as with every single LLM update

faxmeyourcode • yesterday at 6:34 PM

How does it compare to mythos?

nickandbro • yesterday at 10:09 PM

I just prompted GPT-5.5 Pro "Solve Nuclear Fusion" and it one shotted it (kidding obviously)

cchrist • yesterday at 7:50 PM

Which is better GPT-5.5 or Opus 4.7? And for what tasks?

egorfine • yesterday at 7:47 PM

> We are releasing GPT‑5.5 with our strongest set of safeguards to date

...

> we’re deploying stricter classifiers for potential cyber risk which some users may find annoying initially

So we should be expecting to not be able to check our own code for vulnerabilities, because inherently the model cannot know whether I'm feeding my code or someone else's.

➕ show 1 reply

cynicalpeace • yesterday at 6:13 PM

It's possible that "smarter" AI won't lead to more productivity in the economy. Why?

Because software and "information technology" generally didn't increase productivity over the past 30 years.

This has been long known as Solow's productivity paradox. There's lots of theories as to why this is observed, one of them being "mismeasurement" of productivity data.

But my favorite theory is that information technology is mostly entertainment, and rather than making you more productive, it distracts you and makes you more lazy.

AI's main application has been information space so far. If that continues, I doubt you will get more productivity from it.

If you give AI a body... well, maybe that changes.

➕ show 3 replies

senko • yesterday at 7:21 PM

I might just be following too many AI-related people on X, but omg the media blitz around 5.5 is aggressive.

Soo many unconvincing "I've had access for three weeks and omg it's amazing" takes, it actually primes me for it to be a "meh".

I prefer to see for myself, but the gradual rollout, combined with full-on marketing campaign, is annoying.

journal • yesterday at 11:33 PM

does it have cached pricing?

ace2pace • yesterday at 8:18 PM

I hear its as good as Opus 4.7.

The battle has just begun

woeirua • yesterday at 6:48 PM

Nice to see them openly compare to Opus-4.7… but they don’t compare it against Mythos which says everything you need to know.

The LinkedIn/X influencers who hyped this as a Mythos-class model should be ashamed of themselves, but they’ll be too busy posting slop content about how “GPT-5.5 changes everything”.

➕ show 1 reply

c0rruptbytes • yesterday at 10:16 PM

literally cannot launch the codex app anymore

throwaway2027 • yesterday at 6:43 PM

Good timing I had just renewed my subscription.

wiseowise • yesterday at 8:47 PM

> One engineer at NVIDIA who had early access to the model went as far as to say: "Losing access to GPT‑5.5 feels like I've had a limb amputated.

Everybody understands that you need to make money, but can you tone it down with the f*cking FOMO, please? It sounds just pathetic at this point:

'one engineer at NVIDIA', 'limb amputated'

Put the cunt in a room and give me a handsaw, I want to see how fast he'll give up his arm over some cloud model.

I_am_tiberius • yesterday at 6:24 PM

I'd really like to see improvements like these: - Some technical proof that data is never read by open ai. - Proof that no logs of my data or derived data is saved. etc...

➕ show 1 reply

tantalor • yesterday at 6:58 PM

> A playable 3D dungeon arena

Where's the demo link?

debba • yesterday at 6:51 PM

Cannot see it in Codex CLI

➕ show 1 reply

objektif • yesterday at 6:10 PM

Are there faster mini/nano versions as well?

➕ show 2 replies

phillipcarter • yesterday at 6:49 PM

... sigh. I realize there's little that can be done about this, but I just got through a real-world session determining of Opus 4.7 is meaningfully better than Opus 4.6 or GPT 5.4, and now there's another one to try things with. These benchmark results generally mean little to me in practice.

Anyways, still exciting to see more improvements.

jawiggins • yesterday at 7:58 PM

What is the major and minor semver meaning for these models? Is each minor release a new fine-tuning with a new subset of example data while the major releases are made from scratch? Or do they even mean anything at this point?

➕ show 1 reply

jedisct1 • yesterday at 9:07 PM

GPT-5.4 is already an incredible model for code reviews and security audits with the swival.dev /audit command.

The fact that GPT-5.5 is apparently even better at long-running tasks is very exciting. I don’t have access to it yet, but I’m really looking forward to trying it.

wslh • yesterday at 9:06 PM

Related and insightful: "GPT-5.5: Mythos-Like Hacking, Open to All" [1].

[1] https://news.ycombinator.com/item?id=47879330

elAhmo • yesterday at 7:40 PM

Is Codex receiving 5.4 or 5.5 release?

I am still using Codex 5.3 and haven't switched to GPT 5.4 as I don't like the 'its automatic bro trust us', so wondering is Codex going to get these specific releases at all in the future.

ant6n • yesterday at 8:41 PM

My impression has been that ChatGPT-5.4 has been getting dumber and more exhausting in the last couple of weeks. Like it makes a lot of obvious mistakes, ignores (parts of) prompts. keeps forgetting important facts or requirement.

Maybe this is a crazy theory, but I sometimes feel like they gimp their existing models before a big release to you'll notice more of a "step".

➕ show 1 reply

numbers • yesterday at 6:31 PM

I've stopped trusting these "trust me bro" benchmarks and just started going to LM Arena and looking for the actual benchmark comparisons.

https://arena.ai/leaderboard/code

➕ show 2 replies

varispeed • yesterday at 7:19 PM

I am sceptical. The generation after 4o models have become crappier and crappier. Hope this one changes the trend. 5.4 is unusable for complex coding work.

mondojesus • yesterday at 6:48 PM

I'm still using 5.3 in codex. Are 5.4 and 5.5 better than 5.3 in concrete ways?

➕ show 1 reply

enraged_camel • yesterday at 6:35 PM

Is this the first time OpenAI compared their new release to Anthropic models? Previously they were comparing only to GPT's own previous versions.

k2xl • yesterday at 6:24 PM

ARC-AGI 3 is missing on this list - given that the SOTA before 5.5 <1% if I recall, I wonder if this didn't make meaningful progress.

➕ show 1 reply

cmrdporcupine • yesterday at 6:13 PM

Not rolled out to my Codex CLI yet, but some users on Reddit claiming it's on theirs.

xnx • yesterday at 6:21 PM

Next up: Google I/O on May 19?