Huge fan that Gemini-3 prompted OAI to ship this.
Competition works!
GDPval seems particularly strong.
I wonder why they held this back.
1) Maybe this is uneconomical ?
2) Did the safety somehow hold back the company ?
looking forward to the internet trying this and posting their results over the next week or two.
COMPETITION!
> I wonder why they held this back.
IMHO, I doubt they were holding much back. Obviously, they're always working on 'next improvements' and rolled what was done enough into this but I suspect the real difference here is throwing significantly more compute (hence investor capital) at improving the quality - right now. How much? While the cost is currently staying the same for most users, the API costs seem to be ~40% higher.
The impetus was the serious threat Gemini 3 poses. Perception about ChatGPT was starting to shift, people were speculating that maybe OAI is more vulnerable than assumed. This caused Altman to call an all-hands "Code Red" two weeks ago, triggering a significant redeployment of priorities, resources and people. I think this launch is the first 'stop the perceptual bleeding' result of the Code Red. Given the timing, I think this is mostly akin to overclocking a CPU or running an F1 race car engine too hot to quickly improve performance - at the cost of being unsustainable and unprofitable. To placate serious investor concerns, OAI has recently been trying to gradually work toward making current customers profitable (or at least less unprofitable). I think we just saw the effort to reduce the insane burn rate go out the window.