One of the core features I look for is expressive control.
Either in the form of the api via pitch/speed/volume controls, for more deterministic controls.
Or in expressive tags such as [coughs], [urgently], or [laughs in melodic ascending and descending arpeggiated gibberish babbles].
the 25MB model is amazingly good for being 25MB. How does it handle expressive tags?
thank you so much. Right now, it cannot handle expressive tags. what kind of tags would be most helpful according to you?