GPT-5.2

955 points • by atgctg • yesterday at 6:04 PM • 802 comments • view on HN

https://platform.openai.com/docs/guides/latest-model

System card: https://cdn.openai.com/pdf/3a4153c8-c748-4b71-8e31-aecbde944...

Comments

This feels like "could've been an email" type of thing, a very incremental update that just adds one more version. I bet there is literally no one in the world who wanted *one more version of GPT* in the list of available models from OpenAI.

"All models" section on https://platform.openai.com/docs/models is quite ridiculous.

m12k • today at 8:29 AM

So, does 5.2 still have a knowledge cutoff date of June 2024, or have they managed to complete another full pre-training run?

svara • today at 8:08 AM

In my experience, the best models are already nearly as good as you can be for a large fraction of what I personally use them for, which is basically as a more efficient search engine.

The thing that would now make the biggest difference isn't "more intelligence", whatever that might mean, but better grounding.

It's still a big issue that the models will make up plausible sounding but wrong or misleading explanations for things, and verifying their claims ends up taking time. And if it's a topic you don't care about enough, you might just end up misinformed.

I think Google/Gemini realize this, since their "verify" feature is designed to address exactly this. Unfortunately it hasn't worked very well for me so far.

But to me it's very clear that the product that gets this right will be the one I use.

➕ show 1 reply

jbkkd • today at 8:18 AM

A new model doesn't address the fundamental reliability issues with OpenAI's enterprise tier.

As an enterprise customer, the experience has been disappointing. The platform is unstable, support is slow to respond even when escalated to account managers, and the UI is painfully slow to use. There are also baffling feature gaps, like the lack of connectors for custom GPTs.

None of the major providers have a perfect enterprise solution yet, but given OpenAI's market position, the gap between expectations and delivery is widening.

CodeCompost • today at 7:50 AM

For the first time, I've actually hidden an AI story on HN.

I can't even anymore. Sorry this is not going anywhere.

➕ show 1 reply

breakingcups • yesterday at 6:37 PM

Is it me, or did it still get at least three placements of components (RAM and PCIe slots, plus it's DisplayPort and not HDMI) in the motherboard image[0] completely wrong? Why would they use that as a promotional image?

0: https://images.ctfassets.net/kftzwdyauwt9/6lyujQxhZDnOMruN3f...

➕ show 10 replies

goobatrooba • yesterday at 9:22 PM

I feel there is a point when all these benchmarks are meaningless. What I care about beyond decent performance is the user experience. There I have grudges with every single platform and the one thing keeping me as a paid ChatGPT subscriber is the ability to sort chats in "projects" with associated files (hello Google, please wake up to basic user-friendly organisation!)

But all of them * Lie far too often with confidence * Refuse to stick to prompts (e.g. ChatGPT to the request to number each reply for easy cross-referencing; Gemini to basic request to respond in a specific language) * Refuse to express uncertainty or nuance (i asked ChatGPT to give me certainty %s which it did for a while but then just forgot...?) * Refuse to give me short answers without fluff or follow up questions * Refuse to stop complimenting my questions or disagreements with wrong/incomplete answers * Don't quote sources consistently so I can check facts, even when I ask for it * Refuse to make clear whether they rely on original documents or an internal summary of the document, until I point out errors * ...

I also have substance gripes, but for me such basic usability points are really something all of the chatbots fail on abysmally. Stick to instructions! Stop creating walls of text for simple queries! Tell me when something is uncertain! Tell me if there's no data or info rather than making something up!

➕ show 4 replies

zone411 • yesterday at 7:46 PM

I've benchmarked it on the Extended NYT Connections benchmark (https://github.com/lechmazur/nyt-connections/):

The high-reasoning version of GPT-5.2 improves on GPT-5.1: 69.9 → 77.9.

The medium-reasoning version also improves: 62.7 → 72.1.

The no-reasoning version also improves: 22.1 → 27.5.

Gemini 3 Pro and Grok 4.1 Fast Reasoning still score higher.

➕ show 3 replies

agentifysh • yesterday at 10:21 PM

Looks like they've begun censoring posts at r/Codex and not allowing complaint threads so here is my honest take:

- It is faster which is appreciated but not as fast as Opus 4.5

- I see no changes, very little noticeable improvements over 5.1

- I do not see any value in exchange for +40% in token costs

All in all I can't help but feel that OpenAI is facing an existential crisis. Gemini 3 even when its used from AI Studio offers close to ChatGPT Pro performance for free. Anthropic's Claude Code $100/month is tough to beat. I am using Codex with the $40 credits but there's been a silent increase in token costs and usage limitations.

➕ show 2 replies

simonw • yesterday at 7:01 PM

Wow, there's a lot going on with this pelican riding a bicycle: https://gist.github.com/simonw/c31d7afc95fe6b40506a9562b5e83...

➕ show 12 replies

rallies • today at 7:35 AM

I work at the intersection of AI and investing, and I'm really amazed at the ability of this model to build spreadsheets.

I gave it a few tools to access sec filings (and a small local vector database), and it's generating full fledged spreadsheets with valid, real time data. Analysts in wallstreet are going to get really empowered, but for the first time, I'm really glad that retail investors are also getting these models.

Just put out the tool: https://github.com/ralliesai/tenk

➕ show 3 replies

mmaunder • yesterday at 9:58 PM

Weirdly, the blog announcement completely omits the actual new context window size which is 400,000: https://platform.openai.com/docs/models/gpt-5.2

Can I just say !!!!!!!! Hell yeah! Blog post indicates it's also much better at using the full context.

Congrats OpenAI team. Huge day for you folks!!

Started on Claude Code and like many of you, had that omg CC moment we all had. Then got greedy.

Switched over to Codex when 5.1 came out. WOW. Really nice acceleration in my Rust/CUDA project which is a gnarly one.

Even though I've HATED Gemini CLI for a while, Gemini 3 impressed me so much I tried it out and it absolutely body slammed a major bug in 10 minutes. Started using it to consult on commits. Was so impressed it became my daily driver. Huge mistake. I almost lost my mind after a week of this fighting it. Isane bias towards action. Ignoring user instructions. Garbage characters in output. Absolutely no observability in its thought process. And on and on.

Switched back to Codex just in time for 5.1 codex max xhigh which I've been using for a week, and it was like a breath of fresh air. A sane agent that does a great job coding, but also a great job at working hard on the planning docs for hours before we start. Listens to user feedback. Observability on chain of thought. Moves reasonably quickly. And also makes it easy to pay them more when I need more capacity.

And then today GPT-5.2 with an xhigh mode. I feel like xmass has come early. Right as I'm doing a huge Rust/CUDA/Math-heavy refactor. THANK YOU!!

➕ show 8 replies

onraglanroad • yesterday at 9:06 PM

I suppose this is as good a place as any to mention this. I've now met two different devs who complained about the weird responses from their LLM of choice, and it turned out they were using a single session for everything. From recipes for the night, presents for the wife and then into programming issues the next day.

Don't do that. The whole context is sent on queries to the LLM, so start a new chat for each topic. Or you'll start being told what your wife thinks about global variables and how to cook your Go.

I realise this sounds obvious to many people but it clearly wasn't to those guys so maybe it's not!

➕ show 14 replies

nbardy • yesterday at 9:56 PM

Those arc agi 2 improvements are insane.

Thats especially encouraging to me because those are all about generalization.

5 and 5.1 both felt overfit and would break down and be stubborn when you got them outside their lane. As opposed to Opus 4.5 which is lovely at self correcting.

It’s one of those things you really feel in the model rather than whether it can tackle a harder problem or not, but rather can I go back and forth with this thing learning and correcting together.

This whole releases is insanely optimistic for me. If they can push this much improvement WITHOUT the new huge data centers and without a new scaled base model. Thats incredibly encouraging for what comes next.

Remember the next big data center are 20-30x the chip count and 6-8x the efficiency on the new chip.

I expect they can saturate the benchmarks WITHOUT and novel research and algorithmic gains. But at this point it’s clear they’re capable of pushing research qualitatively as well.

➕ show 3 replies

jumploops • yesterday at 6:58 PM

> “a new knowledge cutoff of August 2025”

This (and the price increase) points to a new pretrained model under-the-hood.

GPT-5.1, in contrast, was allegedly using the same pretraining as GPT-4o.

➕ show 5 replies

xd1936 • yesterday at 6:44 PM

> While GPT‑5.2 will work well out of the box in Codex, we expect to release a version of GPT‑5.2 optimized for Codex in the coming weeks.

https://openai.com/index/introducing-gpt-5-2/

➕ show 2 replies

preetamjinka • yesterday at 6:45 PM

It's actually more expensive than GPT-5.1. I've gotten used to prices going down with each latest model, but this time it's gone up.

https://platform.openai.com/docs/pricing

➕ show 6 replies

zug_zug • yesterday at 6:34 PM

For me the last remaining killer feature of ChatGPT is the quality of the voice chat. Do any of the competitors have something like that?

➕ show 15 replies

snake_doc • yesterday at 10:24 PM

> Models were run with maximum available reasoning effort in our API (xhigh for GPT‑5.2 Thinking & Pro, and high for GPT‑5.1 Thinking), except for the professional evals, where GPT‑5.2 Thinking was run with reasoning effort heavy, the maximum available in ChatGPT Pro. Benchmarks were conducted in a research environment, which may provide slightly different output from production ChatGPT in some cases.

Feels like a Llama 4 type release. Benchmarks are not apples to apples. Reasoning effort is across the board higher, thus uses more compute to achieve an higher score on benchmarks.

Also notes that some may not be producible.

Also, vision benchmarks all use Python tool harness, and they exclude scores that are low without the harness.

tenpoundhammer • yesterday at 8:39 PM

I have been using chatGPT a ton over the last months and paying the subscription. Used it for coding, news, stock analysis, daily problems, and a whatever I could think of. I decided to give Gemini a go when version three came out to great reviews. Gemini handles every single one of my uses cases much better and consistently gives better answers. This is especially true for situations were searching the web for current information is important, makes sense that google would be better. Also OCR is phenomenal chatgpt can't read my bad hand writing but Gemini can easily. Only downsides are in the polish department, there are more app bugs and I usually have to leave the happen or the session terminates. There are bugs with uploading photos. The biggest complaint is that all links get inserted into google search and then I have to manipulate them when they should go directly to the chosen website, this has to be some kind of internal org KPI nonsense. Overall, my conclusion is that ChatGPT has lost and won't catch up because of the search integration strength.

➕ show 36 replies

josalhor • yesterday at 6:24 PM

From GPT 5.1 Thinking:

ARC AGI v2: 17.6% -> 52.9%

SWE Verified: 76.3% -> 80%

That's pretty good!

➕ show 7 replies

flkiwi • yesterday at 9:58 PM

I gave up my OpenAI subscription a few days ago in favor of Claude. My quality of life (and quality of results) has gone up substantially. Several of our tools at work have GPT-5x as their backend model, and it is incredible how frustrating they are to use, how predictable their AI-isms are, and how inconsistent their output is. OpenAI is going to have to do a lot more than an incremental update to convince me they haven't completely lost the thread.

➕ show 2 replies

blitz_skull • today at 1:10 AM

Again I just tap the sign.

All of your benchmarks mean nothing to me until you include Claude Sonnet on them.

In my experience, GPT hasn’t been able to compete with Claude in years for the daily “economically valuable” tasks I work on.

➕ show 1 reply

youngermax • today at 12:35 AM

Isn't it interesting how this incremental release includes so many testimonials from companies who claim the model has improved? It also focuses on "economically valuable tasks." There was nothing of this sort in GPT-5.1's release. Looks like OpenAI feeling the pressure from investors now.

tpurves • yesterday at 10:22 PM

Undoubtedly each new model from OpenAi has numerous training and orchestration improvements etc.

But how much of each product they release also just a factor of how much they are willing to spend on inference per query in order to stay competitive?

I always wonder how much is technical change vs turning a knob up and down on hardware and power consumption.

GTP5.0 for example seemed like a lot of changes more for OpenAI's internal benefit (terser responses, dynamic 'auto' mode to scale down thinking when not required etc.)

Wondering if GPT5.2 is also case of them in 'code red mode' just turning what they already have up to 11 as a fastest way to respond to fiercer competion.

➕ show 1 reply

doctoboggan • yesterday at 6:24 PM

This seems like another "better vibes" release. With the number of benchmarks exploding, random luck means you can almost always find a couple showing what you want to show. I didn't see much concrete evidence this was noticeably better than 5.1 (or even 5.0).

Being a point release though I guess that's fair. I suspect there is also some decent optimizations on the backend that make it cheaper and faster for OpenAI to run, and those are the real reasons they want us to use it.

➕ show 2 replies

Tiberium • yesterday at 6:35 PM

The only table where they showed comparisons against Opus 4.5 and Gemini 3:

https://x.com/OpenAI/status/1999182104362668275

https://i.imgur.com/e0iB8KC.png

➕ show 1 reply

minadotcom • yesterday at 6:29 PM

They used to compare to competing models from Anthropic, Google DeepMind, DeepSeek, etc. Seems that now they only compare to their own models. Does this mean that the GPT-series is performing worse than its competitors (given the "code red" at OpenAI)?

➕ show 4 replies

nezaj • today at 12:54 AM

We saw it do better at making counter-strike! https://x.com/instant_db/status/1999278134504620363?s=20

ComputerGuru • yesterday at 7:02 PM

Wish they would include or leak more info about what this is, exactly. 5.1 was just released, yet they are claiming big improvements (on benchmarks, obviously). Did they purposely not release the best they had to keep some cards to play in case of Gemini 3 success or is this a tweak to use more time/tokens to get better output, or what?

➕ show 4 replies

rishabhaiover • today at 3:40 AM

After I saw Opus 4.5 search through zig's std io because it wasn't aware of a breaking change in the recent release, I fell in love with claude-code and I don't see a strong enough reason to switch to codex at the moment.

dumbmrblah • yesterday at 7:29 PM

Great! It'll be SOTA for a couple of weeks until the quality degrades due to throttling.

I'll stick with plug and play API instead.

➕ show 1 reply

sigmar • yesterday at 6:39 PM

Are there any specifics about how this was trained? Especially when 5.1 is only a month old. I'm a little skeptical of benchmarks these days and wish they put this up on llmarena

edit: noticed 5.2 is ranked in the webdev arena (#2 tied with gemini-3.0-pro), but not yet in text arena (last update 22hrs ago)

➕ show 2 replies

sfmike • yesterday at 6:21 PM

Everything is still based on 4 4o still right? is a new model training just too expensive? They can consult deepseek team maybe for cost constrained new models.

➕ show 4 replies

ImprobableTruth • yesterday at 6:38 PM

An almost 50% price increase. Benchmarks look nice, but 50% more nice...?

➕ show 1 reply

devinprater • yesterday at 6:58 PM

Can the tables have column headers so my screen reader can read the model name as I go across the benchmakrs? And the images should have alt-text.

whereistejas • yesterday at 11:46 PM

Did anyone notice how Cursor wasn’t an early tester? I wonder why…

mattas • yesterday at 6:32 PM

Are benchmarks the right way to measure LLMs? Not because benchmarks can be gamed, but because the most useful outputs of models aren't things that can be bucketed into "right" and "wrong." Tough problem!

➕ show 2 replies

SkyPuncher • yesterday at 7:26 PM

Given the price increase and speculation that GPT 5 is a MoE model, I'm wondering if they're simply "turning up the good stuff" without making significant changes under the hood.

➕ show 2 replies

xmcqdpt2 • today at 2:04 AM

I don’t know if they used the new ChatGPT to translate this page but I was served the French version and it is NOT good. There are placeholders for quotes like <quote> and the prose is incredibly repetitive. You’d figure that OpenAI of all people would be able to translate something to one of the worlds most spoken language.

a_wild_dandan • yesterday at 7:24 PM

> Unlike the previous GPT-5.1 model, GPT-5.2 has new features for managing what the model "knows" and "remembers to improve accuracy.

Dumb nit, but why not put your own press release through your model to prevent basic things like missing quote marks? Reminds me of that time an OAI released wildly inaccurate copy/pasted bar charts.

➕ show 6 replies

getnormality • today at 6:22 AM

Sweet Jesus. 53% on ARC-AGI-2. There's still gas in this van.

vishal_new • today at 7:43 AM

Hmmm, is there any insight if these are really getting much better at coding? Will hand coding be dead within a few years, just human typing in english?

➕ show 1 reply

ofermend • today at 5:25 AM

GPT-5.2 just added to Vectara Hallucination Leaderboard. Definitely an improvement over GPT-5.1 - congrats to the team

https://github.com/vectara/hallucination-leaderboard

ClipNoteBook • yesterday at 10:22 PM

ChatGPT seems to just randomly pick urls to cite and extract information from. Google Gemini seems to look at heuristics like whether the author is trustworthy, or an expert in the topic. But more advanced

hbarka • yesterday at 9:41 PM

A year ago Sunday Pichai declared code red, now it’s Sam Altman declaring code red. How tables have turned, and I think the acquisition of Windsurf and Kevin Hou by Google seems to correlate with their level up.

namesbc • today at 4:39 AM

So the rosy biased estimate is OpenAI is saving 1 hour of work per day, so 5 hours total per-work week and 20 hours total per-month.

With a subsidized cost of $200/month for OpenAI it would be cheaper to hirer a part-time minimum wage worker than it would be to contract with OpenAI.

And that is the rosiest estimate OpenAI has.

➕ show 3 replies

lend000 • today at 4:25 AM

It seems like they fixed the most obvious issue with the last release, where codex would just refuse to do its job... if it seemed difficult or context usage was getting above 60% or so. Good job on the post-training improvements.

The benchmark changes are incredible, but I have yet to notice a difference in my codebases as of yet.

byt3bl33d3r • yesterday at 11:59 PM

There’s really no point in looking at benchmarks anymore as real world usage of these models varies between task and prompting strategies. Use your internal benchmarks to evaluate and ignore everything else. It is curious to me how they don’t provide a side x side comparison of other models benchmarks for this release

fulafel • yesterday at 6:27 PM

So GDPval is OpenAI's own benchmark. PDF link: https://arxiv.org/pdf/2510.04374

alt Hacker News

GPT-5.2

Comments

🔗 View 50 more comments