logoalt Hacker News

PixelPanda06/16/20257 repliesview on HN

Full disclaimer: I work at Nanonets

Excited to share Nanonets-OCR-s, a powerful and lightweight (3B) VLM model that converts documents into clean, structured Markdown. This model is trained to understand document structure and content context (like tables, equations, images, plots, watermarks, checkboxes, etc.). Key Features:

LaTeX Equation Recognition Converts inline and block-level math into properly formatted LaTeX, distinguishing between $...$ and $$...$$.

Image Descriptions for LLMs Describes embedded images using structured <img> tags. Handles logos, charts, plots, and so on.

Signature Detection & Isolation Finds and tags signatures in scanned documents, outputting them in <signature> blocks.

Watermark Extraction Extracts watermark text and stores it within <watermark> tag for traceability.

Smart Checkbox & Radio Button Handling Converts checkboxes to Unicode symbols like , , and for reliable parsing in downstream apps.

Complex Table Extraction Handles multi-row/column tables, preserving structure and outputting both Markdown and HTML formats.

Huggingface / GitHub / Try it out: https://huggingface.co/nanonets/Nanonets-OCR-s

Try it with Docext in Colab: https://github.com/NanoNets/docext/blob/main/PDF2MD_README.m...


Replies

RicoElectrico06/16/2025

Could be it used to (maybe with help of a downstream LLM) parse a photo/PDF of a restaurant menu into a JSON file conforming to a schema? Or would bigger, hosted multimodal LLMs work better in such case?

arkh06/17/2025

So it feels like it finally let me do one thing I'd wanted for some time: scan printed documents and generate structured pdfs (and not pdf as a picture container).

wisdomseaker06/17/2025

Would any of this be able to handle magazine layouts? I've yet to find anything that can follow their fairly random layouts with text at varying angles etc

uselesswords06/17/2025

Have you found it has better accuracy or scales with larger models? Or are the improvements, if any, marginal compared to the 3B VLM model?

gibsonf106/16/2025

Does it hallucinate with the LLM being used?

show 2 replies
generalizations06/16/2025

Does it have a way to extract the images themselves, or is that still a separate process later?

show 1 reply