One appeal of it is for RL. If it ends up being a lot faster for generation, you'll be able to do a lot more RL.
If people can make RL scalable-- make it so that RL isn't just a final phase, but something which is as big as the supervised stuff, then diffusion models are going to have an advantage.
If not, I think autoregressive models will still be preferred. Diffusion models become fixed very fast, they can't actually refine their outputs, so we're not talking about some kind of refinement along the lines of: initial idea -> better idea -> something actually sound.
> If not, I think autoregressive models will still be preferred. Diffusion models become fixed very fast, they can't actually refine their outputs, so we're not talking about some kind of refinement along the lines of: initial idea -> better idea -> something actually sound.
I'm really curious about this, I'm but a simple client developer, so I don't actually grok some of the differences.
For lack of a better word, there's a "normie" position that "omg diffusion means it can edit!!111! big unlock!" -- I think that's cute but I also don't see it as intuitively correct. And I guess I don't even know why I don't see it that way. But regardless, it sounds like I'm correct there.
> If not, I think autoregressive models will still be preferred.
But here I get lost, at least so far, diffusion models seem strictly significantly faster, and on par with models with the same parameter count.
If that is the case, why would autoregressive models still be preferred?
Asking this also makes me realize I am treating "diffusion models are better" as a premise, if I'm asserting they're always faster and ~same quality...
RL suffers both from sparsity in a huge amount of dimensions as well as convergence brittleness when it's extremely difficult to get it to converge.