logoalt Hacker News

psc007today at 5:16 AM2 repliesview on HN

Eli5? What is this compared to a regular llm assistant model like the base qwen?


Replies

gavmortoday at 5:39 AM

A regular LLM acts as a "policy," mapping a current state to a specific action (states → actions). Their new LLM acts as a "world model," mapping a current state and a chosen action to a predicted future state ((states, actions) → subsequent states). Instead of deciding "what to do," its explicit objective is to predict the exact environment observation that will result from the interaction history and the agent's current action.

I assumed at first that it was trained on synthetic data, but they actually went and deployed real physical hosts and virtual machines (e.g. Ubuntu, macOS, and Android) and browsers. They ran agentic systems on these continuously and recorded the actual, real-world interactions.

So it's an LLM that infers next state, or outcome,as structured data e.g. literal HTML code, UI view hierarchies, or accessibility trees.

show 1 reply
Freedumbstoday at 8:56 AM

Same thing, but qwen has decided to rebrand certain LLMs that were trained slightly differently as "world models". Despite the fact that "world model" typically means !LLM.