Diffusion and flow matching models generate samples by iterative denoising. Iterative denoising mean...

mxwsn • yesterday at 8:07 PM • 2 replies • view on HN

Diffusion and flow matching models generate samples by iterative denoising. Iterative denoising means passing input to the neural network, running a forward pass, and taking the output back as input and rerunning the neural network. Often you do this 100 times, which is slow and expensive.

Flow maps / consistency models / shortcut models instead try to learn to compress this iterative work into 1 forward pass. This makes inference 100x faster as you'd only need to run the neural net forward pass once. Beyond speeding up inference, there are other advanced benefits to this, such as improved ability to perform inference-time steering.

Mathematically, learning a flow map corresponds to learning to solve an ordinary differential equation, i.e., learning the time integral of the velocity field. This mathematical foundation provides the basis for various training objectives for learning flow maps, which involve self-referential identities or identities such as the transport equation, which are discussed in the blog post.

Hope that helps! I'm an ML researcher currently researching flow maps.

Replies

cshimmin • yesterday at 9:13 PM

Very helpful! Naïve question (I haven’t had a chance to read TFA at all and diffusion/flow models are not my area of expertise). Doesn’t learning the integral/solution of the diffusion process in a single pass just take us back to like OG generative CNN that we had before diffusion models took over? Surely the answer is “no” but would love to hear your framing as to why.

➕ show 1 reply

richard___ • yesterday at 9:40 PM

Why is self-distillation necessary? Why can't they get the ground-truth for "skipping" steps?

alt Hacker News

Replies