logoalt Hacker News

schmorptrontoday at 5:44 PM1 replyview on HN

What would a diffusing reasoning model look like? have a pre-defined length [thinking] block that gets diffused over a long time, and then the final output block uses what is in that thinking block as part of its input? And how do diffusion models decide the output length in the first place, is it a pre-set parameter? or does it diffuse an [end] token into the middle somewhere?


Replies

schmorptrontoday at 5:47 PM

got one answer by reading the rest of the comments, makes sense that the diffusion process is inherently reasoning-like: https://www.inceptionlabs.ai/blog/introducing-mercury-2