logoalt Hacker News

armcatyesterday at 10:56 PM1 replyview on HN

This is an outstanding write up, thank you! Regarding LLM latency, OpenAI introduced web sockets in their Responses client recently so it should be a bit faster. An alternative is to have a super small LLM running locally on your device. I built my own pipeline fully local and it was sub second RTT, with no streaming nor optimisations https://github.com/acatovic/ova


Replies

nicktikhonovyesterday at 10:58 PM

Very cool! starred and on my reading list. Would love to chat and share notes, if you'd like

show 2 replies