Yes, that is indeed O(N^2). Which, by the way, is not exponential. Also by the way, caching...

atq2119 • today at 5:40 AM • 1 reply • view on HN

Yes, that is indeed O(N^2). Which, by the way, is not exponential.

Also by the way, caching does not make LLM inference linear. It's still quadratic, but the constant in front of the quadratic term becomes a lot smaller.

Replies

computably • today at 9:20 AM

> Also by the way, caching does not make LLM inference linear. It's still quadratic, but the constant in front of the quadratic term becomes a lot smaller.

Touché. Still, to a reasonable approximation, caching makes the dominant term linear, or equiv, linearly scales the expensive bits.

alt Hacker News

Replies