logoalt Hacker News

Abogen – Generate audiobooks from EPUBs, PDFs and text

325 pointsby mzehrerlast Sunday at 5:56 AM78 commentsview on HN

Comments

dsignlast Sunday at 8:11 AM

Nice!

As an aside, while this tool can be used to create an audiobook from a book you have in text format, for your private consumption, having an author employ something like this to create files for distribution is extremely risky, even if they acknowledge its use and intend those files to only be available on their website.

Indie authors struggle a lot to promote their works, and the new normal is that potential readers, the polite ones[^1], use the slightest hint of AI usage to discard their title and move on...as they are entitled to, since there are so many books.

I in particular have started to hire voice actors that have good acting skills and good diction but for whom English is their second language, or it's their first language but they speak something else at home; sometimes I even ask them to go a notch up with their accents. It helps with the non-AI recognition, and it also increases the appeal of the book for people who would like to try out something new. Once, I did an audition for a project and was pleasantly surprised with how much life people from around the Mediterranean basin were able to inject into their renderings, compared with people from Britain and North America.

[^1] Impolite readers set the town on fire, and then go about and spread that fire to neighboring towns, for good measure.

show 2 replies
m_sahaflast Sunday at 8:16 PM

I imagine a pipeline between Calibre-Web[0] and audiobookshelf[1] going through Abogen, where Calibre-Web supplies the books, Abogen generates the audio version of it, and Audiobookshelf serves them. Great solution for the hearing impaired.

[0] https://github.com/janeczku/calibre-web

[1] https://github.com/advplyr/audiobookshelf

anotherpaullast Sunday at 7:16 AM

Does it turn it into spoken word or an audiobook? Because good audiobooks often have voice actors that read the characters with different emphasis and dialects. I imagine tools like chatgpt could do this for a few sentences but what about an 8-20 hour audiobook?

I think there are still basic hurdles to take before we can go epub to audiobook in a quality that can compete with current state of the art.

Or am I missing something?

show 3 replies
frumiousirclast Sunday at 10:59 AM

This needs to be run from an environment where `pip` is available as that tool is used during the running of the abogen app. Using `uv tool run abogen` gets you started but then the app hangs at model install time. `uv venv && uv pip install pip && source .venv/bin/activate && abogen` lets it run properly.

Otherwise, it's a nicely packaged GUI. Well done!

I tried a PDF and the UI to select pages or sections is good and generation is fast on my laptop's GTX 1650.

The result is an .ogg audio and .ass subtitle file. Played with mpv allows listening and reading along in the terminal. Only issue I have with the result is that visual line breaks from the PDF are preserved resulting in long pauses "randomly" in the middle of sentences. This greatly interrupts understanding of the audio.

Edit: enabling the skipping of single newlines helps!

show 1 reply
logicproglast Sunday at 10:58 AM

I've been using this to try to make audiobooks out of various philosophy books I've been wanting to read, for accessibility reasons, and I ran into a critical problem: if the input text fed to Kokoro is too long, it'll start skipping words at the end or in the middle, or fade out at the end; and abogen chunks the text it feeds to Kokoro by sentence, so sentences of arbitrary length are fed to Kokoro without any guarding. This produces unusable audiobooks for me. I'm working on "vibe coding" my own Kokoro based tkinter personal gui app for the same purpise that uses nltk and some regex magic for better splitting.

show 4 replies
gman83last Sunday at 11:32 AM

I love audiobooks, but I'm a stickler for good narration. I've stopped listening to plenty of audiobooks because I didn't like the narrator. I guess it will be a long time before I can use something like this.

show 4 replies
8s2ngylast Sunday at 7:08 AM

I've been using Kokoro TTS with the CLI app, audiblez, mentioned in the "Similar Projects" section of the README. The model is fast and delivers impressive quality for its small size. Some issues I have faced, however, are: a) It doesn't distinguish periods at the end of sentences from the dots in abbreviations such as "Mr." or "Mrs." The result is an awkward pause between "Mr." and the name. b) It doesn't handle ellipses well. c) Words are pronounced the same way regardless of context.

show 3 replies
amaccuishlast Sunday at 9:19 AM

Amazing, but I'm personally waiting for the one that generates a well formated ePub from a PDF.

floppydlast Sunday at 7:49 AM

I tried Kokoro for voicing blog posts and articles and wasn't impressed to be honest. Right now Gemini 2.5 Flash TTS is a much more capable system with generous free limits (about 10 minutes per generation and about 90 minutes per day). Voices are not very consistent between generations, but for shorter pieces it's not a big deal (but will obviously be for books)

show 1 reply
xtractolast Sunday at 4:20 PM

I assume it doesn't work well for books that have non-text structured elements (code, diagrams, etc)or images (which is expected).

I wonder, is there some open source NN that can consume PDF pages and produce a "pure prose" version of it. Say, a page with mixed text and an image of a car engine would be output to the text and then a detailed description of the image, or what it is depicting.

vismit2000last Monday at 4:27 AM

How does this compare with the lightweight kitten TTS model which was recently on HN? https://news.ycombinator.com/item?id=44807868

TOGoSlast Sunday at 7:08 AM

The demo video doesn't seem to have any audio in it! At least none that either ffmpeg or whatever Firefox uses can recognize.

show 6 replies
numb7rslast Sunday at 6:37 PM

You will want to reconsider the name if you plan to have a presence in Australia or New Zealand. "Abo" is an ethnic slur similar in offensiveness to the N-word.

show 3 replies
dumbasrockslast Monday at 4:14 AM

Would you call a network packet generation tool Pakigen?

show 1 reply
obfuscatorlast Monday at 7:24 AM

My biggest selling point for this would be, that the volume is probably the same throughout the whole text. I am listening to audiobooks to fall asleep, and many voice actors go from very quiet to loud in conversations. It may be good narration, but it's sometimes to quiet to understand, so I need to increase the volume only to be woken up by some loud lines later.

So I imagine generated audiobooks to be good in that regard. Another option would be to have a "normalize volume" setting at audible, or other services.

poulpy123last Sunday at 10:20 AM

perfect, I was looking for something like that ! is it gui only, or is there an api available ? I would like to be able to share a link or a text from my phone and get back the audio

scotty79last Sunday at 8:04 AM

I think the quality of the voice is super important for audiobooks and I think we are just closing in on the required quality with TTS.

I played a bit with Eleven labs voices and while they aren't bad when I tried make them read fragment of a text that I wrote, it sounded chaotic, boring, quite terrible, for anything longer than a sentence or two. But when I tried their v3 voices which they are currently in the process of rolling out, the same text sounded consistent, emotional, engaging, simply amazing. I think we are just crossing vocal uncanny valley.

show 1 reply
nikolayasdf123last Sunday at 6:58 AM

can I choose any voice? would love to read software engineering books in voice of Morgan Freeman, or maybe even better, Scarlett Johansson

show 3 replies
lynx97last Sunday at 9:40 PM

DAISY would be a desirable output format.

lekelast Sunday at 8:44 AM

How big is this app?