logoalt Hacker News

wesammikhailyesterday at 12:07 PM0 repliesview on HN

Because in my mind, as a person not working directly on this kind of stuff, I figured that caching was done similar to any resource caching in a webserver environment.

It´s a semantics issue where the word caching is overloaded depending on context. For people that are not familiar with the inner workings of llm models, this can cause understandable confusion.