For my grug brain can somebody translate this to ELIgrug terms?
Does this mean I would be able to run 500b model on my 48gb macbook without loosing quality?
KV cache compression, so how much memory the model needs to use for extending its context. Does not affect the weight size.
KV cache compression, so how much memory the model needs to use for extending its context. Does not affect the weight size.