this misses a few key things but hits on many others webrtc is a bad protocol, without a doubt. I ...

spongebobstoes • today at 1:31 AM • 1 reply • view on HN

this misses a few key things but hits on many others

webrtc is a bad protocol, without a doubt. I do like websockets as an easy alternative, but you do need to reinvent decent portions of webrtc as a result

I like the idea of MoQ but it's not widely used. probably worth experimenting with, especially as video enters the chat

> and then a GPU pretends to talk to you via text-to-speech

OpenAI is speech-to-speech, there is no TTS in voice mode

> It takes a minimum of 8* round trips (RTT) to establish a WebRTC connection

signalling can be done long ahead of time, though I don't see this mentioned in the OpenAI blog. I also saw some new webrtc extensions that should reduce setup time further

ultimately though, it comes down to

> It’s not like LLMs are particularly responsive anyway

I expect to see a shift in how S2S models work to be lower latency like the new voice API models that OpenAI announced

to be fair, the new models were released the day after this MoQ blog was published

Replies

Terretta • today at 3:55 AM

> OpenAI is speech-to-speech, there is no TTS in voice mode

Which results in the interesting situation where the transcript isn't what was said:

Q: Why do the voice transcripts sometimes not match the conversation I had?

A: Voice conversations are inherently multimodal, allowing for direct audio exchange between you and the model. As a result, when this audio is transcribed, the transcription might not always align perfectly with the original conversation.

alt Hacker News

Replies