> This will reduce token size, performance & operational costs.
How? The models aren't trained on compressed text tokens nor could they be if I understand it correctly. The models would have to uncompress before running the raw text through the model.
That is what I am looking for. a) LLMs are trained using compressed text tokens and b) use compressed prompts. Don't know how..but that is what I was hoping for.