This is an outstanding write up, thank you! Regarding LLM latency, OpenAI introduced web sockets in ...

armcat • yesterday at 10:56 PM • 1 reply • view on HN

This is an outstanding write up, thank you! Regarding LLM latency, OpenAI introduced web sockets in their Responses client recently so it should be a bit faster. An alternative is to have a super small LLM running locally on your device. I built my own pipeline fully local and it was sub second RTT, with no streaming nor optimisations https://github.com/acatovic/ova

Replies

nicktikhonov • yesterday at 10:58 PM

Very cool! starred and on my reading list. Would love to chat and share notes, if you'd like

➕ show 2 replies

alt Hacker News

Replies