logoalt Hacker News

Consistency diffusion language models: Up to 14x faster, no quality loss

195 pointsby zagwdttoday at 4:15 AM88 commentsview on HN

Comments

abepputoday at 3:49 PM

Diffusion model papers are always interesting to read but I always feel like they need some mechanism to insert or delete tokens. In the example in the figure in this post, once it has fixed "British munchkin cats _ _ and ..." you _can't_ get to "British munchkin cats are a new and controversial breed." because there's not the right number of tokens between "cats" and "and". In a coding context, if your model samples a paren or a comma or something which is entirely plausible at that position, it can still close off an expansion which would be syntactically correct.

show 5 replies
MASNeotoday at 7:11 AM

I wish there would be more of this research to speed things up rather than building ever larger models

show 3 replies
yjftsjthsd-htoday at 5:47 AM

Is anyone doing any form of diffusion language models that are actually practical to run today on the actual machine under my desk? There's loads of more "traditional" .gguf options (well, quants) that are practical even on shockingly weak hardware, and I've been seeing things that give me hope that diffusion is the next step forward, but so far it's all been early research prototypes.

show 3 replies
simonwtoday at 8:46 AM

I'd love to know what's going on with the Gemini Diffusion model - they had a preview last May and it was crazy fast but I've not heard anything since then.

fumeux_fumetoday at 3:22 PM

Seeing half of an AR LLM's output tokens go to generating a predefined json schema bothers me so much. I would love to have an option to use diffusion for infilling.

show 1 reply
nltoday at 7:30 AM

Releasing this on the same day as Taalas's 16,000 token-per-second acceleration for the roughly comparable Llama 8B model must hurt!

I wonder how far down they can scale a diffusion LM? I've been playing with in-browser models, and the speed is painful.

https://taalas.com/products/

show 4 replies
LarsDu88today at 8:48 AM

A lot of this post-training recipe feels reminiscent of DINO training (teacher/student, use of stop gradients). I wonder if the more recent leJEPA SigREG regularization research might be relevant here for simpler post-training.

bjt12345today at 8:00 AM

I do wonder why diffusion models aren't used alongside constraint decoding for programming - surely it makes better sense then using an auto-regressive model.

show 1 reply
LarsDu88today at 5:37 AM

Google is working on a similar line of research. Wonder why they haven't rolled out a GPT40 scaled version of this yet

show 1 reply
WiSaGaNtoday at 9:08 AM

I think diffusion makes much more sense than auto-regressive (AR) specifically in code generation comparing to chatbot.

hanifbbztoday at 8:11 AM

Is this available as open source anywhere to try?

cubefoxtoday at 12:31 PM

This doesn't mention the drawback of diffusion language models, the main reason why nobody is using them: they have significantly lower performance on benchmarks than autoregressive models at similar size.

LoganDarktoday at 8:13 AM

Can't wait for the day I can actually try a diffusion model on my own machine (128GB M4 Max) rather than as a hosted service. So far I haven't seen a single piece of software that supports it.

show 1 reply
refulgentistoday at 5:18 AM

If this means there’s a 2x-7x speed up available to a scaled diffusion model like Inception Mercury, that’ll be a game changer. It feels 10x faster already…

show 1 reply