logoalt Hacker News

corysamayesterday at 5:45 PM1 replyview on HN

There technically was one experiment early on to trick Stable Diffusion into generating spectrograms that could be converted into audio. And, it worked surprisingly well.

https://web.archive.org/web/20230314190913/https://www.riffu...

https://huggingface.co/riffusion/riffusion-model-v1

But, I'd expect everything in the past 3 years to diffuse the audio waveform directly.


Replies

Kyeyesterday at 6:42 PM

That's probably what I was thinking of. I haven't kept up as much on non-text generative AI.