> I wonder how much of this is due to Diffusion models having less capacity for memorization than...

thesz • last Sunday at 7:04 PM • 1 reply • view on HN

> I wonder how much of this is due to Diffusion models having less capacity for memorization than auto regressive models

Diffusion requires more computation resources than autoregressive models, compute excess is proportional to the length of sequence. Time dilated RNNs and adaptive computation in image recognition hint us that we can compute more with same weights and achieve better results.

Which, I believe, also hint at the at least one flaw of the TS study - I did not see that they matched DLM and AR by compute, they matched them only by weights.

Replies

heyitsguay • last Sunday at 7:29 PM

Do you have references on adaptive methods for image recognition?

➕ show 1 reply

alt Hacker News

Replies