What if I told you that one can model bidirectional attention just by recurring over causal attention, and it’s still fast enough? Hint: It’s called chain of thought.
I strongly believe it’s time to discontinue diffusion models, solely on the fact that iterated auto-regression is faster, more parallelizable, and just as potent with proper prompting techniques (of course, unless you consider CoT as a form of diffusion, which it essentially is).
Can you explain how CoT is a form of diffusion or models bidirectional attn?
Chain of thought is not a form of diffusion. Diffusion models clearly have characteristics that are useful and worthy of further research and should not be “discontinued“
I'd respectfully suggest that it's perhaps not time to "discontinue diffusion models". Minsky and Papert set AI back by decades by suggesting neural networks were a dead end which couldn't learn XOR. There's not a chance of an HN comment having the same effect of course but my point is that it's easy to dismiss things prematurely.