There’s really no point in looking at benchmarks anymore as real world usage of these models varies ...

byt3bl33d3r • yesterday at 11:59 PM • 0 replies • view on HN

There’s really no point in looking at benchmarks anymore as real world usage of these models varies between task and prompting strategies. Use your internal benchmarks to evaluate and ignore everything else. It is curious to me how they don’t provide a side x side comparison of other models benchmarks for this release

alt Hacker News