Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: An open-source, self-hostable synced narration platform for ebooks (smoores.gitlab.io)
338 points by smoores 8 months ago | hide | past | favorite | 67 comments
Hi, I made a thing! This is by far the most work I've ever sunk into a side project; I've been working on this thing for over two years, and I'm super proud of it, even though there's still a lot more to do!

Storyteller is a self-hosted platform for ebooks with synced narration. This is basically self-hosted WhisperSync, for anyone familiar with that Amazon product.

It's currently made up of two self-hostable backend systems and a mobile app for reading and listening to the books it produces. Technically it uses an open spec, EPUB 3's "Media Overlay", for syncing the narration, but very few ebook apps actually support Media Overlays, and even fewer work well and have nice interfaces.

The mobile app is available on the Apple App Store as "Storyteller Reader", and I plan to release it for Android as well early next year.

Anyway, I hope someone finds this interesting or useful!




Amazing! I've made a similar ebooks-audiobooks aligner years ago: https://github.com/r4victor/syncabook. At that time, I chose to synthesize the text and align two audio sequences because I found texts-alignment approaches (including ML-based ones) too compute-intensive and inadequate for long texts. I see Storyteller works by aligning the texts. Could you give some view on how long it takes to sync a book?

Also, my experience was that audio and text versions are often very different (e.g. the audio having an intro missing from the text). It'd be very interesting to know how well Storyteller handles such cases. Does it require manual audio/text editing or handle the differences automatically?


Hello! syncabook is awesome, and indeed Storyteller does take "the opposite" approach when it comes to forced alignment.

Others have linked to the docs, where I go into detail about the syncing algorithm, but at a high level:

Storyteller uses Whisper to transcribe the audio to text (this is the most computationally expensive part of the process)

Then we use a Levenshtein-distance-based fuzzy search algorithm to find each chapter in the text (this is attempting to account for the difference between audio and text versions, as you said!)

Then for each chapter, we find the start and end timestamp of each sentence, again using a fuzzy search across the transcription.

In general, Storyteller does a pretty good job; it treats the ebook as the source of truth, which means that at the moment it sometimes misses introductory and ending pieces of the audiobook, though it's on the roadmap to have some support for explicitly triggering those when that happens.


The docs say it's usually 1-4 hours depending on the book and the hardware: https://smoores.gitlab.io/storyteller/docs/syncing-books

The docs also have a detailed section about the algorithm that goes into how it auto-handles differences between the audio and the text.


One obvious optimization is to sample the audio file at regular intervals and transcribe only a part of the text. Then just interpolate the locations. This can speed it up by a couple of orders of magnitude.


This is true, but it really limits the ability to highlight the current sentence visually while it's being read, which is great for language learning and for reducing cognitive load. I actually spent a lot of time trying to get the timing as precise as I could to make this feel as natural as possible, and I think the effect is really nice!


Ah, makes sense. Maybe have it as an option?

And I haven't realized that you can actually see sentences highlighted as they are being read. I'd love that for Chinese (I'm learning it, so it'd help me a lot). I'll try and see if it "just works", and contribute a patch if it doesn't.


There's an open ticket for languages other than English! https://gitlab.com/smoores/storyteller/-/issues/10. If you want to take a look, please do! I don't have any contributing guidelines yet, unfortunately, but they'll probably come soon. I think Whisper does have Cantonese and Mandarin support, so it should be possible to add support for those languages, though we'll have to look into nltk support for sentence tokenization as well!


A few years ago, I made that as a YouTube channel based on LibriVox audiobooks, maybe you'll enjoy it:

Simplified Mandarin: https://youtube.com/playlist?list=PLVlVz7EDz7fprPeVpqQCvlkMI...

Traditional Mandarin: https://youtube.com/playlist?list=PLVlVz7EDz7fpQZr29P5hqVveL...

... also 33 other languages https://youtube.com/@literature_for_eyes_and_ears/playlists

I left it to languish once I discovered the demand wasn't that great and I was spending more time making the videos than people ultimately spent watching them.


This is really neat, it's something I hadn't thought about before. I've started listening to audiobooks on my commute, but I read at night. I currently use audiobookshelf[0] to listen to my ebooks, and it has support for ebooks as well. I've added a comment[1] on a discussion if audiobookshelf could read the epubs your took creates.

[0]: https://www.audiobookshelf.org/

[1]: https://github.com/advplyr/audiobookshelf/issues/189#issueco...


> I've started listening to audiobooks on my commute, but I read at night.

This is basically exactly why I started down this road over two years ago. I really wanted to be able to switch back and forth between my audiobooks and their text representations!

Thanks for mentioning Storyteller in that discussion, I'll have to hop in there!


Looking forward to trying out the Android app when it's available!


I absolutely love this. However, my wife and kids all read EPUBS on kobo e-readers, so I wish we could somehow sync the last page read from kobo to Storyteller so that we could pick up on audiobook later. I’m not opposed to installing koreader on all of our kobos either if that would be required for syncing… it does look like koreader doesn’t support epub3 media overlays, but it does have a sync feature.


Thanks! Server-side position syncing and integration with KOReader are both on the roadmap, actually; you're not the first one to bring this up!


Amazon now has the text (books), and the audio (audible), and it’s absurd that there’s not some sort of sync feature. It would actually encourage people to cross-purchase books. There are so many times that I’m reading an ebook and I want to continue while driving and wish there was some sort of obvious and seamless “handoff” to continue with the audio version.


This has existed for a while, it's called Whispersync for Voice. Not available for all titles but it's there


TIL!

It looks like the feature is only available if you “add on” the audible version when you’re making your ebook purchase? And, for limited titles.

If I just bought a book in audible, there should be a “buy ebook” button in that app! And if I have the book in kindle, it should give me the option to add on the audio book after purchase. Seems like a missed opportunity— there must be a reason for it being so clunky.

Edit: I have not been able to find a single whispersync title. Looks like it’s not enabled at all in Canada? And the US books that have the feature don’t even follow the setup (eg icon) as described on the website (https://www.audible.com/ep/wfs)


You don’t need to buy ebook and audiobook at the same time.

The icon appears when looking at the audiobook’s product page (both on Amazon and Audible). Unfortunately the ebook page doesn’t appear to have something similar.


Wow, this really blew up while I wasn't looking! Thank you everyone who's popped in here to ask questions and give feedback. If anyone does spend some time trying to set this up, please don't hesitate to hop into our Gitter channel (https://smoores.gitlab.io/storyteller/docs/say-hi) and say hi or ask for support or give feedback.


More information would be nice, a link to the iOS app or screenshots or what features the project have.

Is it a ebook/a book library like audiobookshelf with sync or just sync? ( https://www.audiobookshelf.org/ )


That's a great point; I'll try to add some more of these. I definitely meant to link to the app store page from the docs; I actually just updated them to include that.

It's a full ebook/audiobook library, with sync, though I've focused much more on the reader experience so far than the library management experience. Improving the library management experience is on the horizon, though!


Finding the app wasn’t super easy, I do wish they’d link to from the mobile apps page

https://apps.apple.com/us/app/storyteller-reader/id647446772...


Super cool project!

> even though there's still a lot more to do

A few have asked on this thread already, but since you're already using AI to transcibe, it would be super cool if we can use AI to generate audio using TTS

I quit audible (signed up a few times) because there are very few high quality audio book, even those spoke by the authors are bad (most of them are not pro narrator)

A good AI would be amazing, as they never get tired speaking for hours, yet maintaining the same energetic voice, intonation and pace.



This is pretty interesting...

I once wrote a similar thing for building a custom LJSPEECH dataset out of ebook/audiobook combinations to synthesize my favorite narrator voices using coqui-tts and the VITS model and make them "publish" books that never came out as audiobook.

It was able to synchronize the book contents to timestamps, split the spoken word in to sentences and create a LJSPEECH datasets out of the combinations. I used aeneas[1], it was a bit finicky to set up, but after a while it even was able to map non-english languages (in my case german) with more than 80% accuracy. Worked out pretty well, the LJSPEECH datasets were good (I still have them here), but the TTS tech was not there yet :-) Maybe it's time to revive this project using newer modelling approaches like XTTS or something...

[1]: https://www.readbeyond.it/aeneas/


I've thought about exactly this a few years back but lacked the technical skills to implement it. there are some great books out there as you mentioned, but even worse are great books with mediocre narration/production. eg, A Song of Ice and Fire on Audible is absolutely horrid. The Martian by Andy Weir is fantastic. Can I transplant Will wheaton or Greg Tremblay into GOT? Can I have multiple characters narrated by different voices?

please revisit it if you can.


You can do this today, though you would definitely be breaking copyright (you need to strip the DRM from the epub), and if you're cloning someone's voice without their permission you're probably breaking some more laws. You're pretty safe though assuming you don't distribute it or try to make money.

Check out https://github.com/aedocw/epub2tts for creating an audiobook from epub. Take a look in the utils directory for notes about fine-tuning a voice clone. I can tell you I've done some voices that are close enough to the original to be pretty shocking.

Feel free to get in touch if you have any questions, it's pretty fun making your own audiobooks with the reader of your choice!


IMHO the original narration on The Martian by RC Bray is better than Wil's. I enjoyed Wil's work on Ernest Cline's books but RC Bray and Dennis E. Taylor are (for me) top of the mountain when it comes to SF narration.


You didn't include the link: https://smoores.gitlab.io/storyteller/

Looks super nice, the next step is to build a fully synced ecosystem for book management.


You mean a system like Audiobookshelf[0]? I can highly recommend this, by the way. Works more reliably than any paid service I've ever tried.

[0] https://www.audiobookshelf.org/


I'm more interested in something that would unite audiobooks and textual books.

I love to jump between listening (in my car or while walking) and reading. Right now, only Amazon Kindle + Audible provides a good experience, but it's impossible to import your own audiobooks into Audible.


Yup, this is the goal! Library management and a reader app already exist, though there's definitely work to be done, especially on the library management front.


What I really want to get from the new era of machine learning we supposedly are coming through is human-quality self-hosted text-to-speech and speech-to-text so I would be able to listen to text ebooks and convert big podcasts and video/audio lecture courses to text making it easy to search through them and quote phrases from them. Is this it? Whatever I could find so far were either significantly worse than a human could do or expensive online services.


Amazing, I’ve been wanting something like this for years. If only Libby would integrate this so it could be used with rented books.

It would be great if you could add a link to the app on the App Store.


I keep forgetting to do this! Here's a link: https://apps.apple.com/us/app/storyteller-reader/id647446772... and I'll push up a change to the docs right now that includes that.


What’s your use case for it?


Audiobooks while running / cooking / other activity where reading doesn’t make sense.

Ebook elsewhere.


Looks great! Is there an e-ink e-reader it’s compatible with? Would love to abandon the Amazon castle but could not go back to reading on a screen.


Looks like an interesting project.

I do highly suggest that a quick intro demo video and/or screen shots of a tool like this would be beneficial to the project.


Thanks! I think you're probably right. In the meantime, there are some screenshots on the App Store page for the reader app: https://apps.apple.com/us/app/storyteller-reader/id647446772...


Awesome! Thanks for sharing and working on this! WhisperSync functionality is a game-changer; it's one of the main reasons I'm able to read so much (switching modalities several times per day). I'd love to see this featureset become ubiquitous instead of being so tightly coupled to proprietary, DRM'd Amazon / Audible.


Thanks, I'm so glad folks seem to like it! Agreed, I remain astonished how limited support is for synced narration (forget non-DRM; really only Amazon even provides this feature!). It's totally changed how I consume books. Hopefully EPUB's Media Overlay spec (which Storyteller uses) will become more widespread!


Looks really nice. I wanted to do exactly this with my project Kavita, but have been distracted with other things. I've heard Whisper has great potential and a few of my users have been doing something similar with it.

Look forward to see how this project matures. We need more options in the book reading scene that are self-hosted and not Calibre.


This looks absolutely incredible, and like something I’ve been trying to find for years! Thank you so much building this!


Thank you, that's so exciting! Please let me know if you try it out, I'd really love to get your feedback!


How does the narration work, is it automatically generated? For a year now I have a long commute and listen to audiobooks. However I find the narration vary wildly in quality and think oftentimes text-to-speech might actually be better


> Once we have individual tracks to work with, we begin transcription. This is the most resource intensive part of the process. We rely on the Whisper AI transcription model from OpenAI, via WhisperX. The WhisperX project also uses wave2vec2 to provide accurate word-level timestamps, which is important for sentence-level synchronization. The transcription process is fairly standard; the only interesting addition to the process that Storyteller makes is to supply an "initial prompt" to the transcription model, outlining its task as transcribing an audiobook chapter and providing a list of words from the book that don't exist in the English dictionary as hints.

https://smoores.gitlab.io/storyteller/docs/how-it-works/the-...


You provide an audiobook and an ebook and it syncs them.


As others have said, you provide the audiobook (which could technically be something you generated yourself with TTS!) and Storyteller syncs it. However, I've added an issue on GitLab to investigate building TTS directly into Storyteller, because not all books have audiobooks, and it would be cool to fill that gap!


Good job! I'm probably going to use this. Would love to have my collection accessible from mobile. A small "nit". Would be great to have non-docker installation instructions readily available.


Thanks! What's your preferred installation/setup? I started with docker for simplicity/ubiquity, but I know it's not everyone's cup of tea. The API server in particular can be a bit challenging to set up properly, but if I know what folks are looking for I can try to provide some guidance!


Most likely will try this on a RPi, as a dedicare machine, running Linux or FreeBSD.

I did not go through minimum requirements but I have a LAMP stack running radius for a small shop and so far, runs happily on an RPi 2 featuring 1 GB of RAM. I have daily backups and everything required to spin up a clown if the SD card becomes faulty.


This looks awesome. I might be missing it somewhere, but what's the minimum required hardware to run something like this locally?


Can this function as "Plex for audiobooks"? I don't really have a need for synced books but it would be nice to keep fewer audiobooks on my phone.


As noted elsewhere in the comments, you can use Plex with the Prologue app to literally have Plex for audiobooks. But yes, it seems this will also do what you want.


Thanks. Inherent reliability issues with Plex aside, Prologue is iOS-exclusive.


Is there a demo of the narration? I couldn’t find any


It doesn't generate narration, it syncs existing audio books with their written counterparts by transcribing the audio.


This is super cool, I love my audiobook app (Prologue) but this could tempt me away. Looking forward to setting this up and trying it out!


Prologue is also my go to audiobook app, and I really do love it, too. Its a significant inspiration for me for the reader apps; hopefully one day soon it will have feature parity!


Is there a tool that would convert ebook to a single (or a set of) mp3?


Thank you for your hard work!!


Does it sync the reading progress of the ebook among clients?


Not yet, but it's on the roadmap (https://gitlab.com/smoores/storyteller/-/issues/13)! This one actually ought to be pretty straightforward; the only trick to it is making sure that it doesn't interfere with the local-first goals of the apps.


This works only for drm-free ebooks and audiobooks?


That's correct; it requires the ability to analyze the actual contents of the ebook and audiobook, so they can't be locked down with DRM. This is unfortunately pretty limiting, but at least for audiobooks, online stores like libro.fm have pretty massive catalogues of DRM-free audiobooks!


That's what the docs say, yes


man.. if someone could hook the creation service into audiobookshelf this could be an extremely potent combination..




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: