On just the transcription side:
WhisprFlow produces much better speech-to-text for long text messaging-by-voice (dictation / transcription) than apple's speech-to-text does. Whisper models in general seem to do a lot better than most built-into-OS/app models. Which is interesting, because there's nothing stopping them from just using Whisper models.
I love MacWhisper personally. Also, Gumroad is a fantastic app distribution platform for my personal values.
https://goodsnooze.gumroad.com/l/macwhisper
As far the "decision tree" side ... there's not much that can be done about that now. Agentic agents still go too "off-the-rails" to be productionized out to the billions of smartphones of the world. I'm working on voice-controlled agentic-with-rails AI features for my HomeAssistant, because Alexa / Google Home suck. But that's a hobby project and rogue AI actions only affect me, not billions of customers.