My understanding is music generation is more like stable diffusion. It generates a waveform as an im...

Kye • today at 4:48 PM • 1 reply • view on HN

My understanding is music generation is more like stable diffusion. It generates a waveform as an image, then turns it into an audio file.

Replies

cubefox • today at 4:55 PM

They do use diffusion models, but I don't think they would make a detour via images. They can just generate audio directly with audio diffusion rather than image diffusion.

➕ show 1 reply

alt Hacker News

Replies