logoalt Hacker News

GaggiXyesterday at 4:57 PM0 repliesview on HN

The model uses Gated DeltaNet and Gated Attention so the memory usage of the KV cache is very low, even at BF16 precision.