> In order to have a chat with an LLM, every time the whole conversation history gets reprocessed - it is not just the last answer / question gets send to the LLM but all preceding back and forth.
Btw, context caching can overcome this, e.g. https://ai.google.dev/gemini-api/docs/caching . However, this means it needs to persist the (large) state in the server side, so it may have costs associated to it.