logoalt Hacker News

UltraSanetoday at 10:51 AM1 replyview on HN

They want the LLM that does the ARC-AGI-3 to be the same LLM that everyone uses.


Replies

fc417fc802today at 11:14 AM

Rephrase that in terms of the human mechanic and hopefully you can see the error of that reasoning. LLMs that perform tasks (as opposed to merely holding conversations) use tools just like we do. That's literally how we design them to operate.

In fact the LLMs that everyone uses today typically have access to specialized task specific tooling. Obviously specialized tools aren't appropriate for a test that measures the ability to generalize but generic tools are par for the course. Writing a bot to play a game for you would certainly serve to demonstrate an understanding of the task.

show 1 reply