Shouldn’t it be possible since forever to put machine readable source information into PDF metadata. It’s more a problem of the tools and programs generating the PDFs.
We spend millions turning structured information into PDFs and billions to extract the same data from a printer rendering language
Exactly. But we have no real coordination or uniform application in how we're creating PDFs across all these programs so we always end up with a fun mix of what will and wont be static, scalable, searchable
Exactly. It’s pretty insane that we have converged on storing documents as PDF. And it looks like no work is done on making PDF files machine readable.