logoalt Hacker News

Kyetoday at 4:48 PM1 replyview on HN

My understanding is music generation is more like stable diffusion. It generates a waveform as an image, then turns it into an audio file.


Replies

cubefoxtoday at 4:55 PM

They do use diffusion models, but I don't think they would make a detour via images. They can just generate audio directly with audio diffusion rather than image diffusion.

show 1 reply