logoalt Hacker News

dmos62today at 7:47 AM1 replyview on HN

So, if I'm reading this correctly, whereas a regular LLM would, given a prompt to edit a file, infer a sed call, this "world" model infers the resulting contents of the file.


Replies

kakugawatoday at 7:57 AM

Here's the demo: https://docs.qwenlm.ai/resources/mlu56_demo.html

Here's the description of the world model prompt for the web domain: "A precise GUI state simulator — given the current screen (as HTML) and a user action, predicts the exact next screen as a complete, self-contained HTML document." (You can click the world model prompt box to expand it and see the full prompt.)

So the world model generates the current state (an html document), an agent tells it what action it wants to perform, the world model generates the next state (another html document).

The other domains are similar, but w/ domain-specific nuance.

show 1 reply