> Also by the way, caching does not make LLM inference linear. It's still quadratic, but the...

computably • today at 9:20 AM • 0 replies • view on HN

> Also by the way, caching does not make LLM inference linear. It's still quadratic, but the constant in front of the quadratic term becomes a lot smaller.

Touché. Still, to a reasonable approximation, caching makes the dominant term linear, or equiv, linearly scales the expensive bits.

alt Hacker News