logoalt Hacker News

amelius12/08/20240 repliesview on HN

In this case the algorithm can determine broad classes like "rural" or "city", and aside from those classes the generated images have little connection with the audio. I think most DL researchers would agree that this is low-effort stuff, and therefore not publish-worthy. In addition to this the word "accurate" in the title is misleading.