I mean I guess but the diffusion objective and the ability to do simultaneous decode both dictate pretty different architectures in practice.
Apparently not. See https://arxiv.org/abs/2511.15927v3
Apparently not. See https://arxiv.org/abs/2511.15927v3