Or you could use Soniox Real-time (supports 60 languages) which natively supports endpoint detection...

lukax • yesterday at 10:25 PM • 2 replies • view on HN

Or you could use Soniox Real-time (supports 60 languages) which natively supports endpoint detection - the model is trained to figure out when a user's turn ended. This always works better than VAD.

https://soniox.com/docs/stt/rt/endpoint-detection

Soniox also wins the independent benchmarks done by Daily, the company behind Pipecat.

https://www.daily.co/blog/benchmarking-stt-for-voice-agents/

You can try a demo on the home page:

https://soniox.com/

Disclaimer: I used to work for Soniox

Edit: I commented too soon. I only saw VAD and immediately thought of Soniox which was the first service to implement real time endpoint detection last year.

Replies

nicktikhonov • yesterday at 10:29 PM

If you read the post, you'll see that I used Deepgram's Flux. It also does endpointing and is a higher-level abstraction than VAD.

➕ show 2 replies

satvikpendem • today at 2:54 AM

I'm using them, how has it been like working there? I see they have some consumer products as well. I wonder how they get state of the art for such low prices over the competition.

alt Hacker News

Replies