logoalt Hacker News

convexlyyesterday at 11:38 PM0 repliesview on HN

My issue with AGI benchmarks is you can never tell if you're measuring actual capability or just how much the training data overlapped with the test.