logoalt Hacker News

Qwen-AgentWorld: Language World Models for General Agents

100 pointsby ilrebtoday at 2:21 AM27 commentsview on HN

Comments

Xx_crazy420_xXtoday at 7:59 AM

I think open-ended simulation for agents will be a key component for training and planning. Similar as human dreams simulate different scenarios in our head. Biggest challenge will be simulating more abstract and complex systems.

Few months ago I did experiment with an open-ended world simulation for AI agent, where the simulated world was progressively building itself based on each of agent actions in open-ended manner. The idea was to give an agent infinite possibility regarding tool calling, where the tool call would be approved by the adjudicator, and the world state would change. The key issues with the PoC were:

  - World decoherence (tried to solve that with a poor graph implementation)
  - World flatness - high abstraction did not account for small events that would compound in real world
  - Start with empty context was real issue to get the agent to explore the world
  
Anyways the project came to be really funny when you watched agent struggling in desperation to perform real world actions which would be impossible in real world. Main observation was that when presented agent with current action budget, it modulated the creativity and how desperate its actions were.
show 3 replies
adrian_btoday at 7:29 AM

The smaller of the two models is open weights and available on Huggingface:

https://huggingface.co/Qwen/Qwen-AgentWorld-35B-A3B

show 1 reply
blurbleblurbletoday at 6:06 AM

This might be pretty big. One of my biggest frustrations with smaller models (especially MoE) is their failure to track workflow state at a high level. I'm constantly reminding them what we decided on or asking them to revisit, and reminding them eats context.

Seems like this might make that a lot less painful. And if not off the bat, with some minimal tuning or even just good prompting.

dippogrifftoday at 5:47 AM

I'm a fan of this direction. For me the most interesting use case for these world models isn't even training, it's verification. If this thing or some idealized version of it can actually reliably simulate state transitions, could you use it to verify an agent's execution path against hard constraints and replace/eclipse LLMs-as-a-judge?

show 1 reply
avaertoday at 8:10 AM

Note this can run locally on a gaming card with quant. I got it running on a 4090 (24GB) 150 t/s with a Q4_K_M.

psc007today at 5:16 AM

Eli5? What is this compared to a regular llm assistant model like the base qwen?

show 1 reply
aliljettoday at 7:24 AM

The benchmarks here are confusing at best. Am I reading correctly that this model is essentially as good or better than all frontier models right now?

show 2 replies
zkmontoday at 7:59 AM

What if they did this using GLM 5.2? This looks like a new direction for AI.

ElenaDaibunnytoday at 7:25 AM

10M trajectories, probably more of a data scale win than a world model breakthrough tbh

Tepixtoday at 4:50 AM

The labels of the very first chart (figure 1, bottom left) are obviously wrong which casts a doubt on the entire paper.

show 1 reply
jkwangtoday at 8:03 AM

[dead]

stingraycharlestoday at 4:47 AM

[dead]

moozechentoday at 6:48 AM

[dead]