logoalt Hacker News

crimsoneertoday at 3:36 PM1 replyview on HN

If someone is using these models, they probably can't or won't use the existing SOTA models, so not sure how useful those comparisons actually are. "Here is a benchmark that makes us look bad from a model you can't use on a task you won't be undertaking" isn't actually helpful (and definitely not in a press release).


Replies

constantcryingtoday at 3:50 PM

Completely agree, that there are legitimate reasons to prefer comparison to e.g. deepeek models. But that doesn't change my point, we both agree that the comparisons would be extremely unfavorable.

show 1 reply