LLM API endpoint does do compaction. OpenAI definitely does support serverside compaction, both explicit and automatic, and this is different than what could be implemented purely clientside: https://developers.openai.com/api/docs/guides/compaction (and there was rumors a few months ago on HN about how activation-preserving/latent it is, vs just summarization). Anthropic as well, in beta (new to me): https://platform.claude.com/docs/en/build-with-claude/compac...