logoalt Hacker News

ComputerGuruyesterday at 7:02 PM4 repliesview on HN

Wish they would include or leak more info about what this is, exactly. 5.1 was just released, yet they are claiming big improvements (on benchmarks, obviously). Did they purposely not release the best they had to keep some cards to play in case of Gemini 3 success or is this a tweak to use more time/tokens to get better output, or what?


Replies

eldenringyesterday at 7:08 PM

I'm guessing they were waiting to figure out more efficient serving before a release, and have decided to eat the inference cost temporarily to stay at the frontier.

famouswafflesyesterday at 7:12 PM

Open AI sat on GPT-4 for 8 months and even released 3.5 months after 4 was trained. While i don't expect such big lag times anymore, generally, it's a given the public is behind whatever models they have internally at the frontier. By all indications, they did not want to release this yet, and only did so because of Gemini-3-pro.

nathan-walltoday at 4:56 AM

If you look at their own chart[1] it shows 5.1 was lagging behind Gemini 3 Pro in almost every score listed there, sometimes significantly. They needed to come out with something to stay ahead. I'm guessing they threw what they had at their disposal together to keep the lead as long as they can. It sounds like 5.2 has a more recent knowledge cutoff; a reasonable guess is they could have already had that but were trying to make bigger improvements out of it for a more major 5.5 release before Gemini 3 Pro came out and then they had to rush something out. Also 5.2 has a new "Extended Thinking" option for Pro. I'm guessing they just turned up a lever that told it to think even longer, which helps them score higher, even if it does take a long time. (One thing about Gemini 3 Pro is it's very fast relative to even ChatGPT 5.1 Pro Thinking. A lot of the scores they're putting out to show they're staying ahead aren't showing that piece.)

[1] https://imgur.com/e0iB8KC

dalemhurleyyesterday at 7:20 PM

My guess is they develop multiple models in parallel.