logoalt Hacker News

adamandsteveyesterday at 4:57 PM1 replyview on HN

"The community" is astroturfed as hell though. Anthropic pays influencers to promote Claude Code and likely bots a ton as well, so it's hard to come to any kind of consensus online. Even if everyone was acting in good faith, some people will have a much better experience than others because of the domain they're working in (e.g. AI being much better at frontend and commonly used libraries).

The only real way to evaluate a model is to test it yourself but that's exhausting for each new model and not comprehensive anyway.


Replies

InsideOutSantayesterday at 5:48 PM

Yeah, it's crazy that there is no trustworthy source for model reviews. I'd love to know how well the new Deepseek 4 actually performs, for example, but I don't want to spend the next week testing it out. Reddit used to be a somewhat useful gauge, but now there are posts on how 4 is useless right next to posts on how amazing it is. And I have no idea if this is astroturfing, or somebody using a quantized version, or different workloads, or what.

I also find it increasingly difficult to evaluate the models I actually do use. Sometimes each new release seems identical or only marginally better than the previous version, but when I then go back two or three version, I suddenly find that oder model to be dramatically worse. But was that older model always that quality, or am I now being served a different model under the same version name?

It's all just so opaque.

show 1 reply