Curious how it does on multi-page scanned PDFs vs. single screenshots? The ORT vision/decoder split is the part that usually makes or breaks CPU VLM OCR...
Very cool, I'm building my own local-first product as well
Now this is legit cool, keep up the great work.
Curious how it does on multi-page scanned PDFs vs. single screenshots? The ORT vision/decoder split is the part that usually makes or breaks CPU VLM OCR...