logoalt Hacker News

Bombthecatyesterday at 8:27 AM1 replyview on HN

You still need to hold the model in memory. If you have for example 16 GB ram, the gains aren't that much


Replies

anon373839yesterday at 8:43 AM

That's not what consumes the most memory at scale. The KV caches are per-user.