That looks very interesting. Could use a demo or examples for us short attention spanned individuals. Would be cool to feed it into TTS or video generation like Sora.