The only AI use case that cares about latency is interactive voice agents, where you ideally want &l...

CuriouslyC • yesterday at 4:51 PM • 2 replies • view on HN

The only AI use case that cares about latency is interactive voice agents, where you ideally want <200ms response time, and 100ms of network latency kills that. For coding and batch job agents anything under 1s isn't going to matter to the user.

Replies

electroly • yesterday at 5:05 PM

tbh, that's a good point about the voice agents that I hadn't considered. I guess there are some latency-sensitive inference workloads. Thanks for pointing that out.

➕ show 1 reply

coredog64 • yesterday at 7:25 PM

A customer service chatbot can require more than one LLM call per response to the point that latency anywhere in the system starts to show up as a degraded end-user experience.

alt Hacker News

Replies