logoalt Hacker News

byt3bl33d3ryesterday at 11:59 PM0 repliesview on HN

There’s really no point in looking at benchmarks anymore as real world usage of these models varies between task and prompting strategies. Use your internal benchmarks to evaluate and ignore everything else. It is curious to me how they don’t provide a side x side comparison of other models benchmarks for this release