It was probably from the other day when roon realized that normal people have it slower than staff.
Then from that they realized they could just run API calls more like staff, fast, not at capacity.
Then they leave the billion other people's calls at remaining capacity.
https://thezvi.substack.com/i/185423735/choose-your-fighter
> Ohqay: Do you get faster speeds on your work account?
> roon: yea it’s super fast bc im sure we’re not running internal deployment at full load
It’s interesting that they kept the price the same while doing inference on Cerebras is much more expensive.
OpenAI in my estimation has the habit of dropping a model's quality after its introduction. I definitely recall the web ChatGPT 5.2 being a lot better when it was introduced. A week or two later, its quality suddenly dropped. The initial high looked to be to throw off journalists and benchmarks. As such, nothing that OpenAI says in terms of model speed can be trusted. All they have to do is lower the reasoning effort on average, and boom, it becomes 40% faster. I hope I am wrong, because if I am right, it's a con game.
Starting off the ChatGPT Plus web users with the Pro model, then later swapping it for the Standard model -- would meet the claims of model behavior consistency, while still qualifying as shenanigans.
tons of posts on reddit that they also significantly dropped quality
This is great.
In the past month, OpenAI has released for codex users:
- subagents support
- a better multi agent interface (codex app)
- 40% faster inference
No joke, with the first two my productivity is already up like 3x. I am so stoked to try this out.