logoalt Hacker News

ostitoday at 7:16 PM1 replyview on HN

But is arc-agi really that useful though? Nowadays it seems to me that it's just another benchmark that needs to be specifically trained for. Maybe the Chinese models just didn't focus on it as much.


Replies

sdenton4today at 7:28 PM

Doing great on public datasets and underperforming on private benchmarks is not a good look.

show 1 reply