logoalt Hacker News

dlivingstontoday at 2:53 AM0 repliesview on HN

What is being discussed is KV caching [0], which is used across every LLM model to reduce inference compute from O(n^2) to O(n). This is not specific to Claude nor Anthropic.

[0]: https://huggingface.co/blog/not-lain/kv-caching