logoalt Hacker News

nicktikhonovyesterday at 10:58 PM2 repliesview on HN

Very cool! starred and on my reading list. Would love to chat and share notes, if you'd like


Replies

riquitotoday at 6:05 AM

You may be interested in gemini-2.5-flash-preview-tts

Text in, audio out, so you can merge in a single step LLM+TTS (streamable)

https://ai.google.dev/gemini-api/docs/models/gemini-2.5-flas...

alfalfasprouttoday at 12:39 AM

Also consider using Cerebras' inference APIs. They released a voice demo a while back and the latency of their model inference is insane.

show 1 reply