What is being discussed is KV caching [0], which is used across every LLM model to reduce inference ...

dlivingston • today at 2:53 AM • 0 replies • view on HN

What is being discussed is KV caching [0], which is used across every LLM model to reduce inference compute from O(n^2) to O(n). This is not specific to Claude nor Anthropic.

[0]: https://huggingface.co/blog/not-lain/kv-caching

alt Hacker News