I'm a fan of this direction. For me the most interesting use case for these world models isn't even training, it's verification. If this thing or some idealized version of it can actually reliably simulate state transitions, could you use it to verify an agent's execution path against hard constraints and replace/eclipse LLMs-as-a-judge?
Well if you can do this then you don't delegate execution path derivation to the agent. The benefit is a predictable coherent world state where you understand the impact of { current state } x { action } without having to enumerate that huge cartesian product.