I'm not sure I understand the point of this as opposed to something like a json file, and also, assuming there is any type of structured format, why one would use an LLM for this task instead of a normal parser.
You assume JSON is a standalone file. It rarely is.
Even if it were, JSON is verbose. Every bracket and quote costs tokens.
In reality, the data is buried in 1MB+ of HTML. You download a haystack to find a needle.
We fetch a standalone text file. It cuts the syntax tax. It is pure signal.
You assume JSON is a standalone file. It rarely is.
Even if it were, JSON is verbose. Every bracket and quote costs tokens.
In reality, the data is buried in 1MB+ of HTML. You download a haystack to find a needle.
We fetch a standalone text file. It cuts the syntax tax. It is pure signal.