logoalt Hacker News

kgeisttoday at 1:25 AM5 repliesview on HN

Every new proprietary model is "groundbreaking" and "look, it just solved task X that no other model could solve," only to be referred to as "that crappy previous-generation model" a month later.

So yeah, I'm totally fine using Kimi-2.7, GLM-5.2 or Deepseek-v4. I think we've already hit the ceiling and most improvements now seem to be from harness improvements and slightly better RL to improve reasoning/tool calling.


Replies

jbverschoortoday at 2:40 AM

Not only that, but to me it seems that after a week the intelligence is being downscaled or routed. Maybe because of lack of capacity

fsutstoday at 6:31 AM

Agreed

matheusmoreiratoday at 2:58 AM

There's at least the possibility that they intentionally degrade the models as time passes. We can't really verify that we're getting what we're paying for all of the time. All the more reason to invest in local inference.

show 4 replies
realusernametoday at 3:14 AM

There's also a lot of benchmark trickery going on, it's becoming harder to see how the latest models really improved.

The top models also seem to have inconsistent performance depending on the time of day and how far we are from the next release.

show 2 replies
4fffstoday at 1:47 AM

Correct. Anything else is pure marketing and you have fallen for it.