Are you looking into speech to speech (no text) models?
Yeah we are! The issue we're seeing is with controllability and hallucinations in speech to speech models that we're trying to work through still
Yeah we are! The issue we're seeing is with controllability and hallucinations in speech to speech models that we're trying to work through still