logoalt Hacker News

andriy_kovalyesterday at 8:14 PM1 replyview on HN

I think it will make results way better and more representative of model abilities..


Replies

simonwyesterday at 8:16 PM

It would... but the test is inherently silly, so I'm still not sure if it's worth me investing that extra effort in it.