Note that this uses a harness so it doesn't qualify for the official ARC-AGI-3 leaderboard
According to the authors the harness isn't ARC-AGI specific though https://x.com/agenticasdk/status/2037335806264971461
> this uses a harness
This seems like an arbitrary restriction. Tool-use requires a harness, and their whitepaper never defines exactly what counts as valid.
Doesn't the chat version of chatgpt or gemini also have interleaved tool calls, so do those also count as with harnesses?
I for one think that harness development is perhaps the most interesting part at the moment and would love to have an alternative leaderboard with harnesses.
It is 100% ARC-AGI-3 specific though, just read through the prompts https://github.com/symbolica-ai/ARC-AGI-3-Agents/blob/symbol...