Not to single you out, parent commenter, but I really hope the quality of discourse on HN will move past these basic comparisons eventually. It seems like every thread on every model release has the exact same comments.
"Wow, X models is Y% better or worse than Claude Z model on T benchmark"
"That's irrelevant, they're just benchmaxing."
"Not useable for daily coding or agentic workloads, the vibes are totally wrong."
"It's almost as good, and costs a lot less, so I will absolutely use it."
"I cannot imagine justifying using these, as the step change means open models lower costs do not make up for the productivity loss"
I'm an unhappy Anthropic customer and really rooting for open models and non-gatekept intelligence, but how do we move on from this now meme-like model release discourse rigamarole. I do not know what that would be. I don't design LLMs nor benchmarks, and I genuinely appreciate that people do their best to provide information, even if non-perfect here. I'm sure most of you who actively read these comment pages on announcements must feel similarly, though, right?
Yeah you definitely have to be skeptical regarding sentiment for open/local model capabilities, since there's bias from what people want to be true.
I generally agree with this in spirit https://www.seangoedecke.com/are-new-models-good/ , but I think you can read Anthropic's results showing Sonnet 5 as almost strictly worse than Opus 4.8 as very credible/meaningful, and then draw comparisons from that
"It's totally obvious they quantitized Claude Z"
I'm not sure what else can be said? I've found benchmarks to be a very weak signal for how good/bad the model is, but it's the #1 thing the companies highlight.
20 minutes after the announcement there's no real useful statement that can be made about it.