LLMs can look back over a certain number (N) of tokens, which roughly correspond to words. For ins...

PaulHoule • 06/16/2025 • 2 replies • view on HN

LLMs can look back over a certain number (N) of tokens, which roughly correspond to words. For instance if you want to summarize or answer questions about a document accurately the length of the document has to be less than N.

Conventionally they use an attention mechanism that compares every token to every other token which has a cost of N*N or N squared which is quadratic. If you want LLMs to chew over a huge amount of context (all the source code for your project) it’s a problem so people are looking for ways around this.

Replies

zoklet-enjoyer • 06/16/2025

Thank you for that explanation

➕ show 1 reply

Icko_ • 06/16/2025

Not even that. With KV-caching, it's linear with the size of the context; and if someone figured out a way to have e.g. NlogN complexity, I imagine with KV-caching it may go down to logN complexity. (If the new algorithm permits that.)

➕ show 1 reply

alt Hacker News

Replies