I think this is a good idea in general, but perhaps a bit too simple. It looks like this only works ...

mbreese • yesterday at 6:14 PM • 1 reply • view on HN

I think this is a good idea in general, but perhaps a bit too simple. It looks like this only works for static sites, right? It then performs a JS fetch to pull in the html code and then converts it (in a quick and dirty manner) to markdown.

I know this is pointing to the GH repo, but I’d love to know more about why the author chose to build it this way. I suspect it keeps costs low/free. But why CF workers? How much processing can you get done for free here?

I’m not sure how you could do much more in a CF worker, but this might be too simple to be useful on many sites.

Example: I had to pull in a docs site that was built for a project I’m working on. We wanted an LLM to be able to use the docs in their responses. However, the site was based on VitePress. I didn’t have access to the source markdown files, so I wrote an MCP fetcher that uses a dockerized headless chrome instance to load the page. I then pull the innerHTML directly from the processed DOM. It’s probably overkill, but an example of when this tool might not work.

But — if you have a static site, this tool could be a very simple way to configure MCP access. It’s a nice idea!

Replies

ami3466 • yesterday at 8:50 PM

The simplicity is a feature. I avoided headless Chrome because standard fetch tools (and raw DOM dumps) pollute the context with navbars and scripts, wasting tokens. This parser converts to clean Markdown for maximum density.

Also, by treating this as an MCP Resource rather than a Tool, the docs are pinned permanently instead of relying on the model to "decide" to fetch them.

Cloudflare Workers handle this perfectly for free (100k reqs/day) without the overhead of managing a dockerized browser instance.

➕ show 1 reply

alt Hacker News

Replies