logoalt Hacker News

wild_eggtoday at 2:23 AM7 repliesview on HN

The baseline configurations all note <2s and <3s times. I haven't tried any voice AI stuff yet but a 3s latency waiting on a reply seems rage inducing if you're actually trying to accomplish something.

Is that really where SOTA is right now?


Replies

dnackoultoday at 5:15 AM

I've generally observed latency of 500ms to 1s with modern LLM-based voice agents making real calls. That's good enough to have real conversations.

I attended VAPI Con earlier this year, and a lot of the discussion centered on how interruptions and turn detection are the next frontier in making voice agents smoother conversationalists. Knowing when to speak is a hard problem even for humans, but when you listen to a lot of voice agent calls, the friction point right now tends to be either interrupting too often or waiting too long to respond.

The major players are clearly working on this. Deepgram announced a new SOTA (Flux) for turn detection at the conference. Feels like an area where we'll see even more progress in the next year.

russdilltoday at 6:50 AM

Been experimenting with having a local Home Assistant agent include a qwen 0.5B model to provide a quick response to indicate that the agent is "thinking" about the request. It seems to work ok for the use case, but it feels like it'd get really repetitive for a 2 way conversation. Another way to handle this would be to have the small model provide the first 3-5 words of a (non-commital) response and feed that in as part of the prompt to the larger model.

duckkg5today at 3:23 AM

Absolutely not.

500-1000ms is borderline acceptable.

Sub-300ms is closer to SOTA.

2000ms or more means people will hang up.

show 1 reply
coderintheryetoday at 3:04 AM

Microsoft Foundry's realtime voice API (which itself is wrapping AI models from the major players) has response times in the milliseconds.

echelontoday at 6:14 AM

Sesame was the fastest model for a bit. Not sure what that team is doing anymore, they kind of went radio silent.

https://app.sesame.com/

wellthisisgreattoday at 2:49 AM

No, there are models with sub-second latency for sure