logoalt Hacker News

tjhorneryesterday at 5:36 PM1 replyview on HN

Who says you need to pipe the entire document with JSON-LD directly into the context window? I agree, that is very wasteful. You can just parse the relevant bits out and convert the JSON-LD data into something like your txt format before presenting it to the LLM. Bake that right into whatever tool it uses to scrape websites.


Replies

tsazanyesterday at 5:46 PM

That solves the Token Tax. It fails the Bandwidth Tax. To get that JSON-LD, you still download 2MB of HTML. You execute JS. You parse the DOM. You are buying a haystack to find a needle, then cleaning the needle. We propose serving just the needle. Furthermore, JSON-LD is strictly for facts. It cannot express @SEMANTIC_LOGIC. It lacks the instructions on how to sell.