I like your project, but I'd like it more if it was accompanied by a video of the text being highlighted as it is being read (a kind of dual modality reading - visual + audio). I created such a tool for myself on MacOS with the Alex voice, that works in browsers on any page and in PDFs. I find dual modality reading to enhance my focus a lot. Btw, I use this tool to read this very thread of comments.
And of course I would want to hear the amazing Wavenet voices used in this role.
The DAISY markup spec is a cool project if you’re into the tech of text/audio sync.
Also check out LearningAlly.org- a non-profit that produces audio books with synced text highlights. They are specifically oriented toward students with reading challenges but I believe anyone with some certification of learning difference can join. (They require certification to avoid conflict with copyright protections)
It occurs to me that the mistakes in emphasis that I hear in these samples are the same mistakes I hear from young readers who are concentrating on decoding the words one at a time. The method that fluent text speakers use is to process the entire sentence while beginning to speak it.
I wonder if a better model of word emphasis considering whole sentences could move an automated reader out of the uncanny valley.
Hey, I just created a similar application using the Cloud Text-To-Speech API from Google and the textract (https://textract.readthedocs.io) library to extract text from lots of kinds of documents:
Hi! I built this to have an easier way to convert ebooks to audiobooks! The backend is powered by AWS Polly. If you have any feedback or feature requests, please feel free to drop me an email at <last 3 characters of username> @ <myusername>.com
What is the typical wait from payment to delivery? I know you can't be specific, but just ballpark it (eg. a few minutes, a few hours, etc.).
I'm not sure what the book's letter-count is, but the price was $2.82. It's only been about 10 minutes, so I'm not displeased, just curious what to expect.
It looks good! Could you add a plain-text (or even markdown) field as an input option? I'd be interested in trying this with blog posts and magazine articles.
If you have any suggestions do let me know! I am currently working on adding support for PDF files, also considering adding a support for the new Google Wavenet speech synthesis but it is much more expensive (about 4x the cost) :(
I built something similar to listen to Paul Graham's essays It's a console app & uses OSX's "say" command for the TTS. Contributions are welcome.
https://github.com/hemantasapkota/awesome-essays
Interesting. The MetaMask phishing detector keeps a blacklist of URLs/domains and compares a site's domain against it using the levenshtein distance algorithm. So it could be a false positive. After a quick check I didn't find Auditus on there:
This is cool and I figured we'd be heading down this path soon enough. A lot of the best audiobooks I've listened to were narrated by people that can do multiple voices well. I was thinking that being able to produce an audiobook that uses different voices for different characters would be great. Something like Narrator:
There are a few IOS apps that do this in real time. The best one by far is 'Voice Dream' and they use the same voices. It is basically and audiobook in your pocket anytime, anywhere for any text file and shows the words as it is reading back, start/stop/pause, adjust speed, change voice, etc etc. All around awesome. When the new google voices or equivalent make it to IOS, it will be almost human-like.
This is a good example of a tool that was created for the accessibility community (vision impaired, dyslexic) and has subsequently been adopted by mainstream readers.
It's really cool to see the applications made possible by the high quality, reasonably priced, and fairly licensed text-to-speech APIs offered by AWS, Azure, and Google Cloud.
The most fleshed out service of this type that I've found is narro.co, which offers web/pdf/epub/video/rss/email/text to audio conversions.
What are the best practices for doing the reverse: taking audio and producing text? I don't mind the translation to be rough, the error rate can be quite high for my purposes, but I want the process not to get stuck and recover so it processes a full length talk.
From my experiments generating subtitles from TV/movie audiotrack, 75% (worst case) to 95% (best case). If you model it as a standard distribution, somewhere around 85-90% accuracy. Most services provide much better accuracy for stuff like calls or conferences with proper microphones and minimal background noise than for things like TV shows and movies. If the input audio is noisy, I would do some noise filtering before piping it into conversion.
As an easier problem, what I’d find useful is a way to keep a pirated audiobook and pirated e-book in sync, the way that Amazon does with WhisperSync. A single app where I upload the .epub and the mp3s and it keeps me in sync when I read in either format.
You can also do this in iBooks with any of the built-in voices available on iOS.
Just turn on Speak Screen in Settings -> General -> Accessibility -> Speech and then swipe down with two fingers while reading your book. It'll even turn the page for you.
Each conversion costs a couple dollars depending on length - cheaper than most audiobooks at the expense of human realism. You can listen to a sample of a human read version of accelerando: https://www.audiobooks.com/audiobook/accelerando/210129
The one generated by auditus is too smooth, slightly unnatural
I usually do this by hand with surprisingly good results: I use calibre to convert the ePub to txt and then fix some common problems (i.e. remove line breaks and page numbers) using regular expressions. Then I convert it to an audio file using the macOS Automator text-to-speech action (be sure to download the high quality voices first).
Love the idea about the project. I tried to upload a Epub and got an error page. I tried 3 times and different voices. I look forward to seeing more of it and think it's an awesome idea.
This doesn't seem to be working. I have tried uploading a sample epub. After the epub is uploaded it sends me to a conversions page. That page is just a copy of the homepage.
And of course I would want to hear the amazing Wavenet voices used in this role.