OK, but then, in this regard, left to right generation is hardy better:
Once you get to "British cats <next-token-here>" you can't get to "British munchkin cats <next-token-here>"; the tokens to the left are done and dusted.
It's kind of a feature. Diffusion is used for images, right? It's like saying, once the image of a door has started to form right next to a kitchen counter, it cannot insert a refrigerator there any more. Well, maybe it doesn't "want to" because that layout is already settled by that time.