logoalt Hacker News

valinetoday at 8:44 AM0 repliesview on HN

Yup exactly, in principle it helps with both inference speed by reducing memory bandwidth usage and also reduces the memory footprint of your kvcache.