They stacked the deck. If v2 was still rule inference + spatial reasoning, a bit like juiced up Rave...

ACCount37 • yesterday at 8:30 PM • 0 replies • view on HN

They stacked the deck. If v2 was still rule inference + spatial reasoning, a bit like juiced up Raven's progressive matrices, then v3 adds a whole new multi-turn explore/exploit agentic dimension to it.

Given how hard even pure v2 was for modern LLMs, I'm not surprised to see v3 crush them. But that wouldn't last.

alt Hacker News