> Instead, diffusion models work much faster for individual prompts, and not by a small margin ei...

zozbot234 • today at 3:44 PM • 1 reply • view on HN

> Instead, diffusion models work much faster for individual prompts, and not by a small margin either.

Diffusion models can't really be trained beyond low-to-mid size and have lower quality than an equally sized, plain one-token-at-a-time model.

Replies

embedding-shape • today at 4:00 PM

As mentioned, I've just finished the implementation and started playing around with it, seems to be doing similarly well inside of my own agent harness as similarly sized "traditional" LLMs. Of course, neither come close to SOTA models, but I suppose if we can figure out the scaling issues you mention, we'd get a bit closer. The performance just feels like it's too good to quickly ditch diffusion. Do you have more info what those "can't be trained beyond low/mid size" issues are in practice today?

➕ show 1 reply

alt Hacker News

Replies