Show HN: Kage – Shadow any website to a single binary for offline viewing

80 points • by tamnd • today at 5:25 PM • 26 comments • view on HN

Comments

I find SingleFile [0] to be a much more robust version of this.

It strips out all the JavaScript too, but also packs everything into a single HTML file that is easy to transfer. Binary assets (like web fonts and images) are packed as base64 strings.

They also offer a CLI powered by Puppeteer. [1]

[0]: https://github.com/gildas-lormeau/singlefile

[1]: https://github.com/gildas-lormeau/single-file-cli

➕ show 4 replies

telesilla • today at 6:55 PM

I've been using httrack (https://www.httrack.com) to download wikis to read on flights, which isn't perfect but better than I'd found previously. I'll try this out, I'd be delighted to have good results. Thanks for the post.

wolttam • today at 5:59 PM

One use I'd have for this is company wikis that you want to give folks easy offline access to (maybe the wiki has documentation that's useful at sites that don't have cellular coverage).

Cool!

It would be especially cool to have a version that didn't require the separate serving process - even though it's nifty you can package up a whole site as a single binary.

Maybe a single HTML entrypoint shim with a bit of javascript that could index into an archive (potentially embedded) of the site's content?

➕ show 1 reply

Igor_Wiwi • today at 6:45 PM

This is quite useful tool, especially for the cases where internet access is limited (the flights for example). I implemented it as a separate feature in mdview.io: for example you can export a document as a html file for offline usage, with all the presentation features like reach tables, mermaid and etc built in. Example https://mdview.io/s/why-markdown-became-default-format-for-a... then try to Export - Export HTML

gregwebs • today at 5:52 PM

This seems like it has potential to create a lot of load on a site- are there settings to set how fast it clones or avoid images/videos? Is there a way to only get a subset of a website?

➕ show 1 reply

sanqui • today at 5:56 PM

Cool concept. I would like to see this combined with mitmproxy for archive grade fidelity. You could be saving exactly the data served and at the same time a representation by a modern (contemporary) browser, with all JS having run. This combination would be my perfect replacement for the WARC format.

➕ show 2 replies

dimiprasakis • today at 6:18 PM

Neat project, I like the idea. One thing from a quick read: you launch Chrome with --no-sandbox. Is there a good reason for that? Security wise it's probably not a good idea. If there is no reason, I'd suggest leaving the sandbox on!

In any case, cool stuff :)

lolpython • today at 6:23 PM

This is cool. I could see myself downloading the articles behind the first couple pages of hacker news with this, for viewing on a flight or long distance train ride with spotty internet

daviding • today at 6:26 PM

Nice idea! fwiw, false positives and all, but the Windows 11 default Windows Security doesn't like it: `leakless.exe: Operation did not complete successfully because the file contains a virus or potentially unwanted software.`

rahimnathwani • today at 5:56 PM

So this is like using wget --mirror except that it works on pages that require javascript, right?

➕ show 1 reply

delduca • today at 6:39 PM

curl can do this

grahamstanes17 • today at 6:10 PM

nice

alt Hacker News

Show HN: Kage – Shadow any website to a single binary for offline viewing

Comments