logoalt Hacker News

unchar1yesterday at 11:08 AM0 repliesview on HN

It's not just figuring out if a model is good at things, but is it good at the things I care about.

Using a targeted eval suite (like a test suite) tells us that.