> This generic answer from Wikipedia is not very helpful in this context. Actually, the general...

nateb2022 • last Thursday at 11:16 PM • 1 reply • view on HN

> This generic answer from Wikipedia is not very helpful in this context.

Actually, the general definition fits this context perfectly. In machine learning terms, a specific 'speaker' is simply a 'class.' Therefore, a model generating audio for a speaker it never saw during training is the exact definition of the Zero-Shot Learning problem setup: "a learner observes samples from classes which were not observed during training," as I quoted.

Your explanation just rephrases the very definition you dismissed.

Replies

woodson • last Thursday at 11:36 PM

From your definition:

> a learner observes samples from classes which were not observed during training, and needs to predict the class that they belong to.

That's not what happens in zero-shot voice cloning, which is why I dismissed your definition copied from Wikipedia.

➕ show 1 reply

alt Hacker News

Replies