logoalt Hacker News

bobmcnamarayesterday at 9:16 PM0 repliesview on HN

> Do LLMs blow the cache?

Sometimes very yes?

If you've got 1GB of weights, those are coming through the caches on their way to execution unit somehow.

Many caches are smart enough to recognize these accesses as a strided, streaming, heavily prefetchable, evictable read, and optimize for that.

Many models are now quantized too to reduce the overall the overall memory bandwidth needed for execution, which also helps with caching.