Yeah we are! The issue we're seeing is with controllability and hallucinations in speech to speech models that we're trying to work through still