LLMs can look back over a certain number (N) of tokens, which roughly correspond to words. For instance if you want to summarize or answer questions about a document accurately the length of the document has to be less than N.
Conventionally they use an attention mechanism that compares every token to every other token which has a cost of N*N or N squared which is quadratic. If you want LLMs to chew over a huge amount of context (all the source code for your project) it’s a problem so people are looking for ways around this.
LLMs can look back over a certain number (N) of tokens, which roughly correspond to words. For instance if you want to summarize or answer questions about a document accurately the length of the document has to be less than N.
Conventionally they use an attention mechanism that compares every token to every other token which has a cost of N*N or N squared which is quadratic. If you want LLMs to chew over a huge amount of context (all the source code for your project) it’s a problem so people are looking for ways around this.