logoalt Hacker News

cubefoxtoday at 12:31 PM0 repliesview on HN

This doesn't mention the drawback of diffusion language models, the main reason why nobody is using them: they have significantly lower performance on benchmarks than autoregressive models at similar size.