Consistency diffusion language models: Up to 14x faster, no quality loss

195 points • by zagwdt • today at 4:15 AM • 88 comments • view on HN

Comments

Diffusion model papers are always interesting to read but I always feel like they need some mechanism to insert or delete tokens. In the example in the figure in this post, once it has fixed "British munchkin cats _ _ and ..." you _can't_ get to "British munchkin cats are a new and controversial breed." because there's not the right number of tokens between "cats" and "and". In a coding context, if your model samples a paren or a comma or something which is entirely plausible at that position, it can still close off an expansion which would be syntactically correct.

➕ show 5 replies

MASNeo • today at 7:11 AM

I wish there would be more of this research to speed things up rather than building ever larger models

➕ show 3 replies

yjftsjthsd-h • today at 5:47 AM

Is anyone doing any form of diffusion language models that are actually practical to run today on the actual machine under my desk? There's loads of more "traditional" .gguf options (well, quants) that are practical even on shockingly weak hardware, and I've been seeing things that give me hope that diffusion is the next step forward, but so far it's all been early research prototypes.

➕ show 3 replies

simonw • today at 8:46 AM

I'd love to know what's going on with the Gemini Diffusion model - they had a preview last May and it was crazy fast but I've not heard anything since then.

fumeux_fume • today at 3:22 PM

Seeing half of an AR LLM's output tokens go to generating a predefined json schema bothers me so much. I would love to have an option to use diffusion for infilling.

➕ show 1 reply

nl • today at 7:30 AM

Releasing this on the same day as Taalas's 16,000 token-per-second acceleration for the roughly comparable Llama 8B model must hurt!

I wonder how far down they can scale a diffusion LM? I've been playing with in-browser models, and the speed is painful.

https://taalas.com/products/

➕ show 4 replies

LarsDu88 • today at 8:48 AM

A lot of this post-training recipe feels reminiscent of DINO training (teacher/student, use of stop gradients). I wonder if the more recent leJEPA SigREG regularization research might be relevant here for simpler post-training.

bjt12345 • today at 8:00 AM

I do wonder why diffusion models aren't used alongside constraint decoding for programming - surely it makes better sense then using an auto-regressive model.

➕ show 1 reply

LarsDu88 • today at 5:37 AM

Google is working on a similar line of research. Wonder why they haven't rolled out a GPT40 scaled version of this yet

➕ show 1 reply

WiSaGaN • today at 9:08 AM

I think diffusion makes much more sense than auto-regressive (AR) specifically in code generation comparing to chatbot.

hanifbbz • today at 8:11 AM

Is this available as open source anywhere to try?

cubefox • today at 12:31 PM

This doesn't mention the drawback of diffusion language models, the main reason why nobody is using them: they have significantly lower performance on benchmarks than autoregressive models at similar size.

LoganDark • today at 8:13 AM

Can't wait for the day I can actually try a diffusion model on my own machine (128GB M4 Max) rather than as a hosted service. So far I haven't seen a single piece of software that supports it.

➕ show 1 reply

refulgentis • today at 5:18 AM

If this means there’s a 2x-7x speed up available to a scaled diffusion model like Inception Mercury, that’ll be a game changer. It feels 10x faster already…

➕ show 1 reply

alt Hacker News

Consistency diffusion language models: Up to 14x faster, no quality loss

Comments