The smaller of the two models is open weights and available on Huggingface:
This might be pretty big. One of my biggest frustrations with smaller models (especially MoE) is their failure to track workflow state at a high level. I'm constantly reminding them what we decided on or asking them to revisit, and reminding them eats context.
Seems like this might make that a lot less painful. And if not off the bat, with some minimal tuning or even just good prompting.
I'm a fan of this direction. For me the most interesting use case for these world models isn't even training, it's verification. If this thing or some idealized version of it can actually reliably simulate state transitions, could you use it to verify an agent's execution path against hard constraints and replace/eclipse LLMs-as-a-judge?
Note this can run locally on a gaming card with quant. I got it running on a 4090 (24GB) 150 t/s with a Q4_K_M.
Eli5? What is this compared to a regular llm assistant model like the base qwen?
The benchmarks here are confusing at best. Am I reading correctly that this model is essentially as good or better than all frontier models right now?
What if they did this using GLM 5.2? This looks like a new direction for AI.
10M trajectories, probably more of a data scale win than a world model breakthrough tbh
The labels of the very first chart (figure 1, bottom left) are obviously wrong which casts a doubt on the entire paper.
35B model from the qwen-3.5 line
[dead]
[dead]
[dead]
I think open-ended simulation for agents will be a key component for training and planning. Similar as human dreams simulate different scenarios in our head. Biggest challenge will be simulating more abstract and complex systems.
Few months ago I did experiment with an open-ended world simulation for AI agent, where the simulated world was progressively building itself based on each of agent actions in open-ended manner. The idea was to give an agent infinite possibility regarding tool calling, where the tool call would be approved by the adjudicator, and the world state would change. The key issues with the PoC were:
Anyways the project came to be really funny when you watched agent struggling in desperation to perform real world actions which would be impossible in real world. Main observation was that when presented agent with current action budget, it modulated the creativity and how desperate its actions were.