logoalt Hacker News

Zobattoday at 10:52 AM3 repliesview on HN

As others have hinted at LLMs aren't really made in a way that makes them likely to play video games (CS/Halo and such) well. I wonder how they'd fare "against" text based adventures like Zork (which they'll no doubt have ample knowledge about) and newer text based adventure games (which they'll know less about).


Replies

kqrtoday at 11:30 AM

They aren't good at Zork[1] and neither at newer and/or more obscure text adventures[2].

[1]: https://www.lowimpactfruit.com/p/zork-bench-an-llm-reasoning...

[2]: https://entropicthoughts.com/evaluating-llms-playing-text-ad...

fphtoday at 11:15 AM

Nethack has been widely used to test reinforcement learning agents, starting from at least 2020; there was a Nethack challenge at NeurIPS 2021. https://nethackchallenge.com/report.html

For a more recent test, see https://kenforthewin.github.io/blog/posts/nethack-agent/ .

lou1306today at 12:08 PM

To be honest, Zork at times makes precious little sense: you are supposed to die over and over before you figure stuff out. For instance, you have to grab the endless-light-source treasure very early on, or you mathematically cannot win. And the game does not spell anything out for you, you just have to "get it" by watching closely at how/why you die.

This is a tall order for an LLM: it needs a lot of context but most of the context will be just noise.