> In order to have a chat with an LLM, every time the whole conversation history gets reprocessed...

_flux • last Thursday at 12:50 PM • 0 replies • view on HN

> In order to have a chat with an LLM, every time the whole conversation history gets reprocessed - it is not just the last answer / question gets send to the LLM but all preceding back and forth.

Btw, context caching can overcome this, e.g. https://ai.google.dev/gemini-api/docs/caching . However, this means it needs to persist the (large) state in the server side, so it may have costs associated to it.

alt Hacker News