You're right, in my rage I typod, its really frustrating, even friends will text me and their text makes no sense, and 2 minutes later "STUPID VOICE TO TEXT" I have a few friends who drive trucks, so they need to be able to use their voice to communicate.
I have to say that OpenAI's Whisper model is excellent. If you could leverage that somehow I think it would really improve. I run it locally myself on an old PC with 3060 card. This way I can run whisper large which is still speedy on a GPU especially with faster-whisper. Added bonus is the language autodetection which is great because I speak 3 languages regularly.
I think there's even better models now but Whisper still works fine for me. And there's a big ecosystem around it.
Better speech transcription is cool, but that feels kinda contrived. Phone calls exist, so do voice messages sent via texting apps, and professional drivers can also just wait a bit to send messages if they really must be text; they're on the job, but if it's really that urgent they can pull over.