logoalt Hacker News

Grimblewaldyesterday at 10:18 PM0 repliesview on HN

Isn't it an open secret that benchmarks are largly irrelevant at this point? Why else we do all have a personalized test battery for new models? That said i've stopped testing chatgpt entierly. Its still ok but is beaten by local models and it gets thrashed by non oai frontier providers. I get the history, but holding up oai outputs as equivallent is lile comparing yahoo to google post yahoo's collapse in search domains.

Oai language models are largly irrelevant at this point imo.