logoalt Hacker News

glimsheyesterday at 5:24 PM3 repliesview on HN

Without reliable benchmarks, they are Mythos-like only in the sense that they accept text as input and produce text as output.


Replies

chrswtoday at 12:41 AM

I don't even look at benchmarks anymore. I just try different models as they're released on our large, proprietary, systems software codebases in real, shipping products or projects that will ship eventually. It's pretty clear which models help me do my job better or faster. I'm fortunate enough to have the token budget to use basically as much as I need, for now.

No need for benchmarks, evals, marketing, system cards or anything like that. I read the web for tips, practices and release announcements. My colleagues and I share our experiences with each other but beyond that, everything else is just noise.

show 1 reply
theplumberyesterday at 8:53 PM

Well if they are hyped like Mythos then we can add that to the list of “like Mythos”. Perhaps what’s missing is their CEO warning the world that their model is too unsafe to be released on the internet and someone must stop them before it’s too late.

irthomasthomasyesterday at 9:53 PM

They provide benchmarks in the paper https:// arxiv.org/abs/2606.21228

show 1 reply