logoalt Hacker News

WarmWashtoday at 5:07 PM1 replyview on HN

I am unable to shake that the Chinese models all perform awfully on the private arc-agi 2 tests.


Replies

ostitoday at 7:16 PM

But is arc-agi really that useful though? Nowadays it seems to me that it's just another benchmark that needs to be specifically trained for. Maybe the Chinese models just didn't focus on it as much.

show 1 reply