I'm surprised anyone took them seriously in the first place.

ModernMech • yesterday at 7:59 PM • 3 replies • view on HN

Replies

What else can people do? Try the dozen of commercial offerings themselves? Okay I suppose that's doable, you task one engineer to try them one by one for one month. But then the next model drops and you start all over again...

But then what about local models? You have hundreds of variations to test yourself. It's simply not doable unless it's your full time hobby.

You need benchmarks to at least separate the cream from the crop, so you're left with only a few choices to test yourself.

subulaz • yesterday at 8:10 PM

a LOT of the people who love benchmarks are middle management hard-selling GenAI/LLM as magic tech sauce to vaguely technical executives who only want to know about the money aka headcount savings they so desperately desire.

their collective butts are already glued to the hype train as they chase numbers they (often) manufactured to justify the latest round of tech spend.

lots of good use cases out there - like the incredible progress with medical imaging analysis or complex system models for construction - and lots of crap use cases that need benchmarks to cosplay relevance.

operatingthetan • yesterday at 8:01 PM

We need good benchmarks or we are just left following the hype train.

alt Hacker News

Replies