logoalt Hacker News

2001zhaozhaoyesterday at 5:47 PM2 repliesview on HN

This is 128B dense though. the K/V cache on long context is going to be massive


Replies

Havocyesterday at 6:30 PM

Don’t think kv size correlates to dense/moe

show 1 reply
syntaxingyesterday at 7:22 PM

With turbo quant, you would reduce it by over 6X.