logoalt Hacker News

AndrewAndrewsentoday at 8:58 AM0 repliesview on HN

Awesome project! I recently ran a (semi-)crowdsourced quality benchmarking for models ≤20b

How do you benchmark them? This would be awesome to implement at the page as well. I will link to this project at https://mlemarena.top/