Look into SSML phoneme tags. Some TTS supports it. That was you can use a powerful LLM to fix these issues ahead of TTS