> this uses a harness This seems like an arbitrary restriction. Tool-use requires a harness, an...

krackers • today at 3:14 AM • 2 replies • view on HN

> this uses a harness

This seems like an arbitrary restriction. Tool-use requires a harness, and their whitepaper never defines exactly what counts as valid.

fermentation • today at 5:43 AM

Right, fair, but look at the prompt. For the purpose of testing general intelligence, this seems kind of pointless.

UltraSane • today at 6:13 AM

It isn't arbitrary. They want measure the capability of the general LLM

➕ show 1 reply

alt Hacker News