logoalt Hacker News

scottyahyesterday at 7:09 PM1 replyview on HN

If you don't spend any time comparing models to the point where you don't know about benchmarks, why do you care where people think the line for SOTA is?


Replies

mikkupikkuyesterday at 7:20 PM

The benchmark game is wholly gamed, but the proof is in the pudding. I know people using Anthropic, OpenAI, and Gemini. Chinese models locally. But who uses Grok for anything but porn? Whatever the benchmarks might say, Grok is just trash in practice. They spent too much time teaching it to be edgy and not enough time teaching it to code.

show 1 reply