logoalt Hacker News

polynomialtoday at 12:43 AM0 repliesview on HN

BUILD AI has a post about this and in particular sharding k-v cache across GPUs, and how network is the new memory hierarchy:

https://buildai.substack.com/p/kv-cache-sharding-and-distrib...