But I've read somewhere that KV cache for speech-to-speech model explodes in size with each tur...

donpark • today at 1:39 AM • 1 reply • view on HN

But I've read somewhere that KV cache for speech-to-speech model explodes in size with each turn which could make on-device full-duplex S2S unusable except for quick chats.

Replies

tmzt • today at 2:38 AM

Gemini Nano is supposedly doing it on device. It looks like something similar should work with Apple GPU and ANE.

alt Hacker News

Replies