Not an expert in LLM benchmarks, but I generally I think of benchmarks as being good particularly fo...

Sir_Twist • yesterday at 6:36 PM • 0 replies • view on HN

Not an expert in LLM benchmarks, but I generally I think of benchmarks as being good particularly for measuring usefulness for certain usecases. Even if measuring LLMs is not as straightforward as, say, read/write speeds when comparing different SSDs, if a certain model's responses are consistently measured as being higher quality / more useful, surely that means something, right?

alt Hacker News