How large is the KV cache? | alt Hacker News

lostmsu • today at 1:10 PM • 1 reply • view on HN

How large is the KV cache?

Replies

0.1 GB per full-attention layer and "The model has 60 transformer layers: 45 GatedDeltaNet (linear attention) + 15 standard full attention." So, 1.5 GB.