alt
Hacker News
epolanski
•
yesterday at 6:44 PM
•
0 replies
•
view on HN
I think that they are simply evaluated on prompt to solution benchmarks.