logoalt Hacker News

aragoniteyesterday at 7:59 PM1 replyview on HN

> But I would love an option (emphasis on option) to see the text side by side with the page images. ... That way, I could "confirm" or "fact check" the faithfulness of the OCR.

You can already do that on Wikisource. For example, here's p. 658 from the entry on "Molecule":

https://en.wikisource.org/wiki/Page:EB1911_-_Volume_18.djvu/...

Also OP: I noticed some fidelity issues in your version (at https://britannica11.org/article/18-0684-s2/molecule). For example parts of the math formula under the line that ends with "the molecules of other kinds" ([1]) are missing (compare [2]). Also, in your version fn. 1 of this article is attached to "as they have always done" ([3]) but it should actually be attached to "Atom" on p. 654 ([4]):

[1] https://britannica11.org/article/18-0684-s2/molecule#:~:text...

[2] https://en.wikisource.org/wiki/Page:EB1911_-_Volume_18.djvu/...

[3] https://britannica11.org/article/18-0684-s2/molecule#:~:text...

[4] https://en.wikisource.org/wiki/Page:EB1911_-_Volume_18.djvu/...


Replies

realityfactchexyesterday at 9:24 PM

That's cool about the WikiSource parallel text+image page view, TIL. Thanks!

As an example flow (since it took a minute to figure out): we can start at https://en.wikisource.org/wiki/1911_Encyclopædia_Britannica then click to navigate/browse volume > section > topic to get to a text page, then click Source tab, then click a Page Number (maybe hunt around for the correct page number), and see the parallel view, text + image. With previous and next page buttons available, retaining the parallel text + image view.