They aren't good at Zork[1] and neither at newer and/or more obscure text adventures[2]. | alt Hacker News

alt Hacker News

kqr • today at 11:30 AM • 0 replies • view on HN

They aren't good at Zork[1] and neither at newer and/or more obscure text adventures[2].

[1]: https://www.lowimpactfruit.com/p/zork-bench-an-llm-reasoning...

[2]: https://entropicthoughts.com/evaluating-llms-playing-text-ad...