logoalt Hacker News

squigzyesterday at 10:54 PM1 replyview on HN

> To archive Metabrainz there is no way but to browse the pages slowly page-by-page. There's no machine-communicable way that suggests an alternative.

Why does there have to be a "machine-communicable way"? If these developers cared about such things they would spend 20 seconds looking at this page. It's literally one of the first links when you Google "metabrainz"

https://metabrainz.org/datasets


Replies

whattoday at 1:51 AM

You expect the developers of a crawler to look at every site they crawl and develop a specialized crawler for them? That’s fine if you’re only crawling a handful of sites, but absolutely insane if you’re crawling the entire web.

show 1 reply