Um, yes this is a extremely specific as a benchmark harness. It has a ton of knowledge encoded about...

DetroitThrow • today at 5:11 AM • 0 replies • view on HN

Um, yes this is a extremely specific as a benchmark harness. It has a ton of knowledge encoded about the tasks at hand. The tweet is dishonest even in the best light.

The hard part of these tests isn't purely reasoning ability ffs.

alt Hacker News