logoalt Hacker News

ahaspelyesterday at 5:34 PM5 repliesview on HN

I rebuilt the 1911 Encyclopædia Britannica into a clean, structured, navigable site:

https://britannica11.org/

What it does:

– ~37k articles reconstructed from the original volumes – section-level structure (contents are clickable within articles) – cross-references extracted and linked – contributors indexed and searchable – original volume + page references preserved and shown while reading – links to the original scans for each page – ancillary material included (prefaces, abbreviations, etc.) – topic index reproduced and cross-linked – full-text search with article metadata (length, volume, etc.)

Most of the work was in parsing and reconstruction: headings, multi-page articles, tables, math, languages, footnotes, plates, and all the small edge cases that come up in a work like this.

The goal was to make something that feels like the original, but is actually usable.

I’d especially appreciate feedback on: – search quality – navigation (sections, cross-references) – anything that looks structurally off

Happy to answer questions about the pipeline or data model


Replies

zozbot234yesterday at 7:51 PM

You might want to add The Reader's Guide to the Encyclopaedia Britannica, PD text available at https://www.gutenberg.org/ebooks/74039 and scans at https://archive.org/details/readersguidetoen00londuoft - It would fit naturally with the Ancillary material that includes the topic-based index.

show 1 reply
nyc_pizzadevyesterday at 9:16 PM

Very nice. I actually spent a bit of time browsing a few topics, which is something I rarely do these days!

A few things... when I click an article and try to jump to a new topic, the top search box (labeled "Search titles and full text...") doesn't work. Second, when I first came to the site, I was a bit stuck. It took a bit of time to realize I need to click on "Articles" or even "Topics" to start browsing. Not sure why, maybe I expected the image to let me enter the site somehow...?

logicalleeyesterday at 5:53 PM

Thanks so much for sharing this. It looks fantastic. A couple of questions, if you don't mind: what license are you releasing this under, if any? Is there any way to download it? The reason someone might want to download it is for use as training data.

show 3 replies
gnerd00yesterday at 6:06 PM

legal terms question here also -- several major world economies are operating under very different rules regarding datasets and publication rights. I am in the USA / California.. will there be terms for me, given that I am not a giant deep-pockets FAANG, just a book person ? commercial use terms for "small business" scale ?

show 3 replies
Soluodyesterday at 8:58 PM

[dead]