logoalt Hacker News

krackerstoday at 3:14 AM2 repliesview on HN

> this uses a harness

This seems like an arbitrary restriction. Tool-use requires a harness, and their whitepaper never defines exactly what counts as valid.


Replies

fermentationtoday at 5:43 AM

Right, fair, but look at the prompt. For the purpose of testing general intelligence, this seems kind of pointless.

UltraSanetoday at 6:13 AM

It isn't arbitrary. They want measure the capability of the general LLM

show 1 reply