i worked on one of the benchmarks typically found in new model releases this benchmark looks very ...

vanuatu • yesterday at 6:00 PM • 0 replies • view on HN

i worked on one of the benchmarks typically found in new model releases

this benchmark looks very good from the methodology. a cog researcher checking the data themselves is very high signal (not scaleable so don't take the benchmark as gospel, but directionally good)

alt Hacker News