One thing I've struggled with before is building a collection of data models based off of a col...

nilirl • today at 12:18 PM • 1 reply • view on HN

One thing I've struggled with before is building a collection of data models based off of a collection of PDF forms.

I wanted to abstract away the PDF form building my own html form on top of a data model that can later be used to programmatically fill the PDF .

Since I had 100s of PDFs, I wanted an OCR+LLM pipeline to build a data model for each PDF. Unfortunately, OCR + LLM works ~90% of the time but sometimes fields are missed or mislabeled in the data model.

Does this sometimes get it wrong during programmatic filling? How do you deal with that?

Replies

nip • today at 12:33 PM

[dead]

alt Hacker News

Replies