Document: * https://im... | alt Hacker News

Eisenstein • 06/16/2025 • 1 reply • view on HN

Document:

Result:

Perhaps it needed more than 1K tokens? But it took about an hour (number 28 in queue) to generate that and I didn't feel like trying again.

How many tokens does it usually take to represent a page of text with 554 characters?

Replies

souvik3333 • 06/16/2025

Hey, the reason for the long processing time is that lots of people are using it, and with probably larger documents. I tested your file locally seems to be working correctly. https://ibb.co/C36RRjYs

Regarding the token limit, it depends on the text. We are using the qwen-2.5-vl tokenizer in case you are interested in reading about it.

You can run it very easily in a Colab notebook. This should be faster than the demo https://github.com/NanoNets/docext/blob/main/PDF2MD_README.m...

There are incorrect words in the extraction, so I would suggest you to wait for the handwritten text model's release.

➕ show 1 reply