thank you so much. Right now, it cannot handle expressive tags. what kind of tags would be most helpful according to you?
Emotion based tagging control would be the most helpful narrowing it down. Tags like [sarcastically] [happily] [joyfully] [fearfully]: so a subsection of adverbs.
A stretch goal is 'arbitrary tags' from [singing] [sung to the tune of {x}] [pausing for emphasis] [slowly decreasing speed for emphasis] [emphasizing the object of this sentence] [clapping] [car crash in the distance] [laser's pew pew].
But yeah: instruction/control via [tags] is the deciding feature for me, provided prompt adherence is strong enough.
Also: a thought...
Everyone is using [] for different kinds of tags in this space: which is very simple. Maybe it makes sense to differentiate kinds of tags? I.E. [tags for modifying how text is spoken] vs {tags for creating sounds not specifically speech: not modifying anything... but instead it's own 'sound/word'}
Intonation (frequency rise/fall) would offer a lot of versatility.