logoalt Hacker News

himata4113yesterday at 10:22 PM0 repliesview on HN

kvcache residency requirements and general latency for good throughput wants good locality, but you're right it could be split across multiple different parts of a single datacenter, but as I mentioned before the weakest link is before the model is ever loaded onto the gpus.

as for reverse engineering I doubt it's something that state sponsored actors would struggle with for too long.