kvcache residency requirements and general latency for good throughput wants good locality, but you&...

himata4113 • yesterday at 10:22 PM • 0 replies • view on HN

kvcache residency requirements and general latency for good throughput wants good locality, but you're right it could be split across multiple different parts of a single datacenter, but as I mentioned before the weakest link is before the model is ever loaded onto the gpus.

as for reverse engineering I doubt it's something that state sponsored actors would struggle with for too long.

alt Hacker News