logoalt Hacker News

Nanonets-OCR-s – OCR model that transforms documents into structured markdown

276 pointsby PixelPanda06/16/202563 commentsview on HN

Comments

PixelPanda06/16/2025

Full disclaimer: I work at Nanonets

Excited to share Nanonets-OCR-s, a powerful and lightweight (3B) VLM model that converts documents into clean, structured Markdown. This model is trained to understand document structure and content context (like tables, equations, images, plots, watermarks, checkboxes, etc.). Key Features:

LaTeX Equation Recognition Converts inline and block-level math into properly formatted LaTeX, distinguishing between $...$ and $$...$$.

Image Descriptions for LLMs Describes embedded images using structured <img> tags. Handles logos, charts, plots, and so on.

Signature Detection & Isolation Finds and tags signatures in scanned documents, outputting them in <signature> blocks.

Watermark Extraction Extracts watermark text and stores it within <watermark> tag for traceability.

Smart Checkbox & Radio Button Handling Converts checkboxes to Unicode symbols like , , and for reliable parsing in downstream apps.

Complex Table Extraction Handles multi-row/column tables, preserving structure and outputting both Markdown and HTML formats.

Huggingface / GitHub / Try it out: https://huggingface.co/nanonets/Nanonets-OCR-s

Try it with Docext in Colab: https://github.com/NanoNets/docext/blob/main/PDF2MD_README.m...

show 7 replies
temp082606/16/2025

I have a Shipibo (indigenous Peruvian language) to Spanish dictionary that I've been trying to translate into a Shipibo to English dictionary using a couple different llms but keep struggling with formatting (two columns, strange line breaks, but also both Shipibo and Spanish in the definitions make it difficult to grok). That all plus being pretty poorly scanned. May need to give this a try.

el_don_almighty06/16/2025

I have been looking for something that would ingest a decade of old Word and PowerPoint documents and convert them into a standardized format where the individual elements could be repurposed for other formats. This seems like a critical building block for a system that would accomplish this task.

Now I need a catalog, archive, or historian function that archives and pulls the elements easily. Amazing work!

show 2 replies
ks204806/16/2025

It’s a shame all these models target markdown and not something with more structure and a specification. There are different flavors of Markdown and limited support for footnotes, references, figures, etc.

show 2 replies
ZQ-Dev806/16/2025

How's this compare with docling (https://github.com/docling-project/docling)?

mvac06/16/2025

How does it compare to Datalab/Marker https://github.com/datalab-to/marker ? We evaluated many PDF->MD converters and this one performed the best, though it is not perfect.

show 2 replies
kordlessagain06/16/2025

I created a Powershell script to run this locally on any PDF: https://gist.github.com/kordless/652234bf0b32b02e39cef32c71e...

It does work, but it is very slow on my older GPU (Nvidia 1080 8GB). I would say it's taking at least 5 minutes per page right now, but maybe more.

Edit: If anyone is interested in trying a PDF to markdown conversion utility built this that is hosted on Cloud Run (with GPU support), let me know. It should be done in about an hour or so and I will post a link up here when it's done.

show 2 replies
Bestora06/16/2025

How does it handle documents with multi column or multi row tables?

e.g. https://www.japanracing.de/Teilegutachten/Teilegutachten-JR1... page 1 rowspan page29 colspan

silversmith06/16/2025

I'm curious, how does it do with non-english texts? It's my understanding that LLM-based OCR solutions fall way behind traditional ones once you introduce other languages.

show 1 reply
raus2206/16/2025

With models like these, when multilingual is not mentioned it will perform really bad on real life non-english pdfs.

show 1 reply
nehalem06/16/2025

How does it do with multi-column text and headers and footers?

show 1 reply
progval06/16/2025

It's not open-source (nor open-weight): https://huggingface.co/nanonets/Nanonets-OCR-s/discussions/2

show 1 reply
b0a04gl06/16/2025

[dead]

tensor06/16/2025

There are no benchmarks or accuracy measures on a hold out set?

show 1 reply
constantinum06/16/2025

It would be interesting to know how it compares with Llamaparse, LLMWhisperer, Marker, Reducto

show 1 reply
Eisenstein06/16/2025

How does it do with handwriting?

show 1 reply
karn9706/16/2025

[dead]