I’m not sure that it’s O(N) with caching but this illustrates the N^2 part:

jannyfer • yesterday at 11:15 PM • 1 reply • view on HN

https://blog.exe.dev/expensively-quadratic

Replies

If there was an exponential cost, I would expect to see some sort of pricing based on that. I would also expect to see it taking exponentially longer to process a prompt. I don't believe LLMs work like that. The "scary quadratic" referenced in what you linked seems to be pointing out that cache reads increase as your conversation continues?

If I'm running a database keeping track of a conversation, and each time it writes the entire history of the conversation instead of appending a message, are we calling that O(N^2) now?

➕ show 2 replies

alt Hacker News

Replies