logoalt Hacker News

httpteapotyesterday at 2:22 PM1 replyview on HN

What do you think of the DeepSeek OCR approach where they say that vision tokens might better compress a document than its pure text representation?

https://news.ycombinator.com/item?id=45640594

I've spent some time feeding llm with scrapped web pages and I've found that retaining some style information (text size, visibility, decoration image content) is non trivial.


Replies

fbouvieryesterday at 5:43 PM

Keeping some kind of style information is definitely important to understand the semantics of the webpage.