logoalt Hacker News

Show HN: Kage – Shadow any website to a single binary for offline viewing

80 pointsby tamndtoday at 5:25 PM26 commentsview on HN

Comments

maxlohtoday at 5:35 PM

I find SingleFile [0] to be a much more robust version of this.

It strips out all the JavaScript too, but also packs everything into a single HTML file that is easy to transfer. Binary assets (like web fonts and images) are packed as base64 strings.

They also offer a CLI powered by Puppeteer. [1]

[0]: https://github.com/gildas-lormeau/singlefile

[1]: https://github.com/gildas-lormeau/single-file-cli

show 4 replies
telesillatoday at 6:55 PM

I've been using httrack (https://www.httrack.com) to download wikis to read on flights, which isn't perfect but better than I'd found previously. I'll try this out, I'd be delighted to have good results. Thanks for the post.

wolttamtoday at 5:59 PM

One use I'd have for this is company wikis that you want to give folks easy offline access to (maybe the wiki has documentation that's useful at sites that don't have cellular coverage).

Cool!

It would be especially cool to have a version that didn't require the separate serving process - even though it's nifty you can package up a whole site as a single binary.

Maybe a single HTML entrypoint shim with a bit of javascript that could index into an archive (potentially embedded) of the site's content?

show 1 reply
Igor_Wiwitoday at 6:45 PM

This is quite useful tool, especially for the cases where internet access is limited (the flights for example). I implemented it as a separate feature in mdview.io: for example you can export a document as a html file for offline usage, with all the presentation features like reach tables, mermaid and etc built in. Example https://mdview.io/s/why-markdown-became-default-format-for-a... then try to Export - Export HTML

gregwebstoday at 5:52 PM

This seems like it has potential to create a lot of load on a site- are there settings to set how fast it clones or avoid images/videos? Is there a way to only get a subset of a website?

show 1 reply
sanquitoday at 5:56 PM

Cool concept. I would like to see this combined with mitmproxy for archive grade fidelity. You could be saving exactly the data served and at the same time a representation by a modern (contemporary) browser, with all JS having run. This combination would be my perfect replacement for the WARC format.

show 2 replies
dimiprasakistoday at 6:18 PM

Neat project, I like the idea. One thing from a quick read: you launch Chrome with --no-sandbox. Is there a good reason for that? Security wise it's probably not a good idea. If there is no reason, I'd suggest leaving the sandbox on!

In any case, cool stuff :)

lolpythontoday at 6:23 PM

This is cool. I could see myself downloading the articles behind the first couple pages of hacker news with this, for viewing on a flight or long distance train ride with spotty internet

davidingtoday at 6:26 PM

Nice idea! fwiw, false positives and all, but the Windows 11 default Windows Security doesn't like it: `leakless.exe: Operation did not complete successfully because the file contains a virus or potentially unwanted software.`

rahimnathwanitoday at 5:56 PM

So this is like using wget --mirror except that it works on pages that require javascript, right?

show 1 reply
delducatoday at 6:39 PM

curl can do this

grahamstanes17today at 6:10 PM

nice