The article says the LLM has to load 15540 tokens every time, I wonder if that can be reduced while ...

hiccuphippo • yesterday at 10:16 PM • 0 replies • view on HN

The article says the LLM has to load 15540 tokens every time, I wonder if that can be reduced while retaining the context maybe with deduplications, removing superfluous words, using shorter expressions with the same meaning or things like that.

alt Hacker News