Who says you need to pipe the entire document with JSON-LD directly into the context window? I agree, that is very wasteful. You can just parse the relevant bits out and convert the JSON-LD data into something like your txt format before presenting it to the LLM. Bake that right into whatever tool it uses to scrape websites.
That solves the Token Tax. It fails the Bandwidth Tax. To get that JSON-LD, you still download 2MB of HTML. You execute JS. You parse the DOM. You are buying a haystack to find a needle, then cleaning the needle. We propose serving just the needle. Furthermore, JSON-LD is strictly for facts. It cannot express @SEMANTIC_LOGIC. It lacks the instructions on how to sell.