logoalt Hacker News

lxgryesterday at 10:36 PM0 repliesview on HN

That said, the KV cache is very much not stateless, so internally inference APIs will be highly incentivized to route requests to instances with as much a shared prefix cached as possible.