logoalt Hacker News

jexptoday at 5:16 PM3 repliesview on HN

Shouldn’t it be possible since forever to put machine readable source information into PDF metadata. It’s more a problem of the tools and programs generating the PDFs.

We spend millions turning structured information into PDFs and billions to extract the same data from a printer rendering language


Replies

vjvjvjvjghvtoday at 5:54 PM

Exactly. It’s pretty insane that we have converged on storing documents as PDF. And it looks like no work is done on making PDF files machine readable.

neonmagentatoday at 5:36 PM

Exactly. But we have no real coordination or uniform application in how we're creating PDFs across all these programs so we always end up with a fun mix of what will and wont be static, scalable, searchable