logoalt Hacker News

logicalleeyesterday at 5:53 PM3 repliesview on HN

Thanks so much for sharing this. It looks fantastic. A couple of questions, if you don't mind: what license are you releasing this under, if any? Is there any way to download it? The reason someone might want to download it is for use as training data.


Replies

zozbot234yesterday at 7:18 PM

Wikisource has the original scans available in the public domain, and their enriched text under CC-BY-SA: https://en.wikisource.org/wiki/EB1911

ahaspelyesterday at 5:55 PM

Thanks!

The underlying text (1911 edition) is public domain, but the structured version here — the parsing, reconstruction, and linking — is something I put together for this site. Right now there isn’t a bulk download available. I’m considering exposing structured access (API or dataset) in some form, but haven’t decided exactly how that will work yet.

If you have a specific use case in mind (especially for training), I’d be interested to hear more.

show 2 replies
realityfactchexyesterday at 6:20 PM

> Is there any way to download it? The reason someone might want to download it is for use as training data.

Another reason would be to able to keep running/using it even if the main site were to go down for whatever reason eventually; or, to operate a mirror of it, for redundancy (linking back to the original, of course).