logoalt Hacker News

kqrtoday at 11:30 AM0 repliesview on HN

They aren't good at Zork[1] and neither at newer and/or more obscure text adventures[2].

[1]: https://www.lowimpactfruit.com/p/zork-bench-an-llm-reasoning...

[2]: https://entropicthoughts.com/evaluating-llms-playing-text-ad...