logoalt Hacker News

cjsaltlakeyesterday at 6:42 PM2 repliesview on HN

code clash I think would be quite hard to game or contaminate unintentionally; considering that models need to compete against one another


Replies

gertlabsyesterday at 7:05 PM

https://gertlabs.com already does this at scale.

An industry-standard benchmark shouldn't be hosted or designed by a lab producing the models, regardless.

Bombthecatyesterday at 6:53 PM

I mean the data / benchmarks