logoalt Hacker News

vlmutolotoday at 7:03 PM1 replyview on HN

Pretty cool, rendering PowerPoint files to an image is probably the only way for LLMs to make sense of them.

Does this work in Cloudflare’s workerd environment? Would be nice to have a cheap serverless render -> LLM (GLM-OCR / PaddleOCR) -> Markdown pipeline for the various MS Office formats.


Replies

wmftoday at 7:37 PM

This code creates a JSON intermediate representation that LLMs could probably consume. You might want to simplify it to focus on content and reduce token usage.