Every new proprietary model is "groundbreaking" and "look, it just solved task X that...

kgeist • today at 1:25 AM • 5 replies • view on HN

Every new proprietary model is "groundbreaking" and "look, it just solved task X that no other model could solve," only to be referred to as "that crappy previous-generation model" a month later.

So yeah, I'm totally fine using Kimi-2.7, GLM-5.2 or Deepseek-v4. I think we've already hit the ceiling and most improvements now seem to be from harness improvements and slightly better RL to improve reasoning/tool calling.

Replies

jbverschoor • today at 2:40 AM

Not only that, but to me it seems that after a week the intelligence is being downscaled or routed. Maybe because of lack of capacity

fsuts • today at 6:31 AM

Agreed

matheusmoreira • today at 2:58 AM

There's at least the possibility that they intentionally degrade the models as time passes. We can't really verify that we're getting what we're paying for all of the time. All the more reason to invest in local inference.

➕ show 4 replies

realusername • today at 3:14 AM

There's also a lot of benchmark trickery going on, it's becoming harder to see how the latest models really improved.

The top models also seem to have inconsistent performance depending on the time of day and how far we are from the next release.

➕ show 2 replies

4fffs • today at 1:47 AM

Correct. Anything else is pure marketing and you have fallen for it.

alt Hacker News

Replies