This is 128B dense though. the K/V cache on long context is going to be massive

2001zhaozhao • yesterday at 5:47 PM • 2 replies • view on HN

Havoc • yesterday at 6:30 PM

Don’t think kv size correlates to dense/moe

➕ show 1 reply

syntaxing • yesterday at 7:22 PM

With turbo quant, you would reduce it by over 6X.

alt Hacker News