logoalt Hacker News

bicsilast Monday at 3:52 AM3 repliesview on HN

What if I told you that one can model bidirectional attention just by recurring over causal attention, and it’s still fast enough? Hint: It’s called chain of thought.

I strongly believe it’s time to discontinue diffusion models, solely on the fact that iterated auto-regression is faster, more parallelizable, and just as potent with proper prompting techniques (of course, unless you consider CoT as a form of diffusion, which it essentially is).


Replies

fancyfredbotlast Monday at 8:53 AM

I'd respectfully suggest that it's perhaps not time to "discontinue diffusion models". Minsky and Papert set AI back by decades by suggesting neural networks were a dead end which couldn't learn XOR. There's not a chance of an HN comment having the same effect of course but my point is that it's easy to dismiss things prematurely.

smuslast Monday at 4:34 AM

Can you explain how CoT is a form of diffusion or models bidirectional attn?

SalmoShalazarlast Monday at 12:01 PM

Chain of thought is not a form of diffusion. Diffusion models clearly have characteristics that are useful and worthy of further research and should not be “discontinued“