logoalt Hacker News

kostajtoday at 1:11 PM1 replyview on HN

Indeed. Real-world claims are somewhat messy. Some of the standard benchmarks, e.g. the questions in AVeriTeC, share similar characteristics.


Replies