logoalt Hacker News

Aurornisyesterday at 3:42 PM3 repliesview on HN

Unfortunately not with a reasonable context length.


Replies

regularfryyesterday at 10:14 PM

I've got 139k context with the UD-Q4_K_XL on a 4090, q8_0 ctk/v. Could probably squeeze a little more but that's enough for me for the moment.

show 1 reply
kkzz99yesterday at 4:25 PM

It really depends on what you think a reasonable context length is, but I can get 50k-60k on a 4090.

GaggiXyesterday at 4:57 PM

The model uses Gated DeltaNet and Gated Attention so the memory usage of the KV cache is very low, even at BF16 precision.