I'm writing an essay about how I use an ancient text editor, GNU Emacs, along with gptel, Gemini, some local models, yt-dlp, and patreon-dl to help me me study an ancient language, Latin.
I want to show how I liberate poorly aligned, pixelated PDF image scans of century-old Latin textbooks from the Internet Archive and transform them into glorious Org mode documents while preserving important typographic details, nicely formatted tables, and some semantic document metadata. I also want to demonstrate how I use a high-performance XML database engine to quickly perform Latin-to-English lookups against an XML-TEI formatted edition of the 19th century Lewis & Short dictionary, and using a RESTXQ endpoint and some XQuery code to dynamically reformat the entries into Org-mode for display in a pop-up buffer.
I intend demonstrate how I built a transcription pipeline in Emacs Lisp using tools such as yt-dlp and patreon-dl to grab Latin-language audio content from the Internet, transcode the audio with ffmpeg, do Voice Activity Detection and chunking in Python with Silero, load the chunks into Gemini's context window, and send it off for transcription and macronization, gather forced-alignment data using local a local wav2vec2-latin model, and finally add word-level linguistic analysis (POS, morphology, lemmas) using a local Stanza model trained on the Classical corpus.
This all gets saved to an an XML file which is loaded into BaseX along with some metadata. I'll then demonstrate some Emacs Lisp code which pulls it into an Org-mode based transcription buffer and minor-mode for reading and study, where I can play audio of any given Latin word, sentence, or paragraph, thanks to the forced-alignment and linguistic analysis data being stored in hidden text properties when the data was fetched from the database.
Lastly, I'd like to explore how to leverage these tools to automatically create flash cards with audio cues in Org mode using the anki-editor Emacs minor mode for sentence mining.
Emacs is ancient? I use it every day. And they just came out with a new major update.
This is insanely cool. Thanks for sharing. I'll follow you on https://muppetlabs.com/.