Very cool! starred and on my reading list. Would love to chat and share notes, if you'd like
You may be interested in gemini-2.5-flash-preview-tts
Text in, audio out, so you can merge in a single step LLM+TTS (streamable)
https://ai.google.dev/gemini-api/docs/models/gemini-2.5-flas...
Also consider using Cerebras' inference APIs. They released a voice demo a while back and the latency of their model inference is insane.
You may be interested in gemini-2.5-flash-preview-tts
Text in, audio out, so you can merge in a single step LLM+TTS (streamable)
https://ai.google.dev/gemini-api/docs/models/gemini-2.5-flas...