Speak IPA – Text to Speech for International Phonetic Alphabet

saurik · on Jan 29, 2017

The IPA itself isn't really sufficient to encode the sound of a word: you also need to know what language is being spoken; there is a little book you can get of vowel charts for all of the major languages in the world that tries to document the exact vocal position of each symbol. The issue is that there's a continuum of sound that can be generated by the human vocal system, and while humans don't want to differentiate between subtle variations while trying to talk to each other (and so you wouldn't expect two words with massively different meanings based on some extremely subtle and difficult to hear difference), they can notice the difference (which is part of what causes people to have "accents" based on their native languages: because they are pronouncing vowels in some slightly different part of their mouth). IPA only represents the major possibilities, not the specific variants (though there are extended and increasingly complex symbols to try to provide some more accuracy, it is again a continuum and not something you can represent with a small and finite set of symbols per vowel: you'd have to start putting a couple numbers down to represent the sound ;P).

Sadly, I'm having a very difficult time finding a good reference for this online. I know all of this because I spent years studying graduate Linguistics at UCSB while I was trying to get a PhD in Computer Science, and I carried around that little book of per-language vowel charts for a long time ;P.

http://www.antimoon.com/how/english-vowel-chart.htm

> For example, the average British /æ/ is slightly more open (more like /a/) than the average American /æ/.

That said, I typed all of that under an expectation that it was going to work really well, but in practice this website sounds a lot like Dr. Sbaitso, and so the nuance of pronunciation is totally lost anyway ;P.

pluma · on Jan 29, 2017

One minor nit: it's important to distinguish between phonemic and phonetic transcriptions in IPA. The former are what's usually found in dictionaries and does not account for variance in dialects and speakers. The latter attempts to represent utterances as they are actually produced by speakers.

There's still some nuance that is lost in transcription but phonetic IPA transcriptions can achieve a pretty close approximation to the real utterances.

saurik · on Jan 29, 2017

This is true, but you make it sound like this is a response to and even a solution for my comment, when I was actually assuming a phonetic transcription: such transcriptions also vary in how narrowly they define the sound involved, and it is the nuance of sound to which I was commenting, as even in a very narrow phonetic transcription (with tons of marks to try to adjust the sounds of the phonemes), you just can't represent what a native speaker sounds like using these symbols without adjusting for language; and I guess we just have to disagree with how "pretty close" the result is, as when I thought about what I would want to use an IPA->speech tool to accomplish, they all involved vowel charts ;P.

3131s · on Jan 30, 2017

> This is true, but you make it sound like this is a response to and even a solution for my comment

I think it's a suitable response. There is enough allophonic variation among speakers of a single language that undoubtedly someone perfectly reciting a narrow IPA transcription could be taken as a plausible native speaker. Even for your purposes (though I'm still not clear on what type of task your envisioning) the IPA could still be useful as an intermediary layer of abstraction, as in storing a mapping by language of IPA vowel symbols to the exact formants required.

bearbin · on Jan 29, 2017

Thanks for taking the time to critique this - it's only something I put together in a few hours for fun. I'm sure somebody with more skill than me would be able to make something like this for multiple languages / accents.

glup · on Jan 29, 2017

This could actually be quite useful for generating stimuli for artificial language learning experiments (used in psychology and linguistics) where you don't want to model longer-distance effects among speech sounds (avoiding correlated queues), and need a 1:1 correspondence between symbols and sounds. Do you have a command line version?

One other thought: you could compare the output with espeak (http://espeak.sourceforge.net/) or use espeak to generate IPA transcriptions for various languages.

echelon · on Jan 29, 2017

This is really cool! Thanks for sharing with us!

Is this using parametric synthesis? (It doesn't sound concatenative.) Do you have a background in speech, signal processing, or audio, and is this just a passing interest, or something you want to continue to explore?

I've been teaching myself speech algorithms and methods off and on for the past six months ago. Recently I developed a concatenative Donald Trump text to speech engine (I've posted about it in the past), but the samples aren't great and it doesn't use proper unit selection. I'm trying to apply ML to generate a massive set of smooth n-phones that concatenate well together.

I'd definitely like to exchange contact info if you're into speech synthesis long term. My info is in my profile.

In any case, really cool project! :)

didymospl · on Jan 29, 2017

It may not be sufficient but it is surely very useful when you have no idea how to pronounce the word. Besides, I think it works quite well for English. When I started learning it I had to rely on the IPA transcriptions every time I looked up new words in a dictionary. Even now, when online dictionaries have nice recordings of the vocabulary spoken by native speakers, I still use it for proper nouns(names etc.) that are not included in dictionaries when I can't find the recorded pronunciation e.g. on Youtube.

PepeGomez · on Jan 29, 2017

It only works for English. Any other language seems to get mangled into English-like syllables.

3131s · on Jan 30, 2017

Are you talking about this project or the IPA generally?

PepeGomez · on Jan 30, 2017

This "project" specifically. It conflates ɾ and t and it ignores most non-english IPA letters.

saycheese · on Jan 29, 2017

>> "there is a little book you can get of vowel charts for all of the major languages in the world that tries to document the exact vocal position of each symbol."

What is the name of the book? Do you have a link to it on Amazon or the ISBN for the version you're recommending?

saurik · on Jan 29, 2017

I am pretty sure the book I had was the "Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet".

https://www.amazon.com/dp/0521637511

nereye · on Jan 29, 2017

For a comprehensive site proposing a more accurate representation than IPA, named canIPA, please see http://venus.unive.it/canipa/dokuwiki/doku.php?id=en:start.

lottin · on Jan 29, 2017

Quite true. I guess you would need different "profiles" corresponding to various languages or major dialects so that each symbol is interpreted correctly.

Mizza · on Jan 29, 2017

It'd be very, very useful if you could F/OSS this, so that we could build something to help illustrate hard-to-pronounce Wikipedia articles, which currently have to be manually pronounced. There are thousands of these needed!

XaspR8d · on Jan 29, 2017

It's amazing for how useful IPA is (at least at providing an reasonable fidelity intermediate form between other, more-accurate representations) how few FOSS projects I've seen use it.

Really what I'm saying is I'd like to see someone build an automated pun-discovery tool.

asaibx · on Jan 30, 2017

This repo[1] contains IPA dictionaries for 17 languages, including English, Spanish, and Chinese in JSON/CSV/XML/plain text format.

If you really want to find puns programmatically, the releases section[2] has a ready-made package with homonyms in all the languages, including English. It should be trivial to make an online service that searches through this file for matches on particular words.

[1] https://github.com/open-dict-data/ipa-dict [2] https://github.com/open-dict-data/ipa-dict/releases

DonaldFisk · on Jan 29, 2017

It would be interesting to learn the approach taken when developing this.

I sounds clear enough, but there are a few issues which would need to be resolved before I would use it. For example, when I entered "o:", which should be a long pure vowel, I got a diphthong. The first example [ˈnɑɹkoʊˌklɛpˈtɑkɹəsi] has the IPA for a General American accent but sounds more like Received Pronunciation to me. Some IPA characters aren't voiced at all. These might be fixable, but how easy this is depends on the implementation approach.

Decades ago, I developed a formant speech synthesizer (the details are here: http://web.onetel.com/~hibou/Formant%20Speech%20Synthesizer....). Formant speech synthesizers work by passing a pulsed or random input through a series of filters to generate speech sounds, and can be easily adapted to different accents and speakers. However, it is difficult to get them to sound natural, so they usually sound more like Daleks than people.

I've also done some rule-based text to speech. This works quite well for Standard English pronunciations in a Glasgow accent, the closest accent to English spelling and therefore the one which can be most reliably generated with the smallest number of exceptions.

More recent approaches to speech synthesis sound more natural but are limited to a particular accent and speaker. It's never a Glasgow accent, and developing one for a new speaker and accent is a major undertaking. Were I to switch accents to Received Pronunciation or General American, there would be many more exceptions to the pronunciation rules. Storing pronunciations in a dictionary only works for words stored in the dictionary.

delgaudm · on Jan 29, 2017

As a voice actor, I am in LOVE with this idea, and have searched for this very thing a number of times. I'm really glad to see it created. Im constantly faced with saying words I have no idea how to pronounce correctly, and I spend far too much time trawling through youtube videos, or on forvo.com, and its ilk in the hopes of finding the word correctly pronounced.

I wonder, though, if it's a shortcoming of IPA that the generated pronunciations are not what I'd expect. Example, my hometown of Annapolis is here [1] by this tool: compared to how it's actually pronounced [2]

[1]https://speak-ipa.bearbin.net/speak.cgi?speak=%C9%99%CB%88n%... [2]https://youtu.be/1I71yL3SG80?t=11

As you can hear it's pretty far off, so much so I would be unable to rely on the computer generated version.

DonaldFisk · on Jan 29, 2017

The transcription you used /əˈnæpəlᵻs/ contains /ᵻ/ which doesn't appear to be a commonly used IPA character (see https://en.wikipedia.org/wiki/Near-close_central_unrounded_v...), and presumably wasn't encoded and so was absent from the output. Try /əˈnæpəlɪs/, which though it might not capture all the nuances of the local accent, at least sounds intelligible.

delgaudm · on Jan 30, 2017

That is helpful! Thank you! I had copied the notation from wikipedia.

aliceyhg · on Jan 29, 2017

This is the first time I'm hearing about IPA. I wanted to test out how IPA works so I searched for chinese to IPA converter and came across this site: http://easypronunciation.com/en/chinese-pinyin-phonetic-tran...

I put in a few chinese sentences, got the IPA, then pasted into your app and listened to the sentence in IPA. Although it wasn't very accurate, its one of the coolest things I learned this year. Thank you very much for sharing.

I think a really cool next step is to add the ability to type things in and get the IPA and pronunciation.

yorwba · on Jan 30, 2017

> I think a really cool next step is to add the ability to type things in and get the IPA and pronunciation.

http://espeak.sourceforge.net/ can already do this, however Chinese support is currently flaky at best when using characters, because different pronunciations are not disambiguated based on context. Giving it Pinyin to work with is enough to fool me non-native speaker, though.

compay · on Jan 29, 2017

I've been hoping to see something like this for quite some time. Kudos for a really cool project and for making it available for people to play with.

At least for Spanish, it can pronounce some words fairly well. I wrote a Spanish orthography-to-IPA converter a few years back. It's up on Heroku until folks crash it if you want to get some Spanish words transcribed to IPA.

http://spanish-demo.herokuapp.com/

I used it to generate a few random words. Some of the sounds were off - for example ɾ (the "r" in "estar"). But many words were pronounced clearly enough to be understood.

yoz-y · on Jan 29, 2017

Some time ago I've found out that the macOS say command does support IPA. However not all word seem to work.

brownbat · on Jan 29, 2017

I always wanted to have a text-to-speech system that could generate novel accents, by taking just a couple parts of the IPA at a time and substituting other sounds.

Text-to-IPA might be easy enough with dictionaries, the swapping is trivial, but IPA-to-speech seems like a harder problem.

moontear · on Jan 29, 2017

Finally some closure: Worcestershire sauce is "wʊstərʃər" according to https://en.wikipedia.org/wiki/Worcestershire_sauce

DonaldFisk · on Jan 29, 2017

In Received Pronunciation /r/ is only pronounced before vowels. Otherwise, it usually modifies the previous vowel. "wʊstəʃə" is a better phonetic transcription.

vorg · on Jan 30, 2017

Or even "wʊstʃə".

AceJohnny2 · on Jan 29, 2017

Hm, it didn't deal too well with Llanfairpwllgwyngyll...

https://speak-ipa.bearbin.net/speak.cgi?speak=%C9%ACan%CB%8C...

https://en.wikipedia.org/wiki/Llanfairpwllgwyngyll

tyingq · on Jan 29, 2017

Didn't know about IPA before seeing this, thanks for sharing.

Saying "hacker news" in IPA using your tool: https://speak-ipa.bearbin.net/speak.cgi?speak=%27h%C3%A6k%C9...

fishnchips · on Jan 29, 2017

I checked a few examples from my native language (Polish) and they sound amusing at best.

orthopteroid · on Jan 30, 2017

Hey cool! I finally have somewhere on the web that will make audio for my markov syllable-synthesis engine!

http://blurtmime.appspot.com/

carlob · on Jan 29, 2017

Doesn't seem to work in Safari, probably because the audio is a wav.

mrec · on Jan 29, 2017

If you mean you get prompted to download speaky.cgi, that happens in Firefox/Windows too.

msephton · on Feb 7, 2017

This is great, but I have to use Chrome to use it. I'd prefer to use Safari.

Do you plan to let Wikipedia use this? It would be really useful on their site.

amelius · on Jan 29, 2017

Back in the 80s, a friend of mine had a text-to-speech program running on a Z80 machine (I think), which sounded a lot like this.

pwdisswordfish · on Jan 29, 2017

This appears to be just a web wrapper around http://espeak.sourceforge.net/, and not a very slick one either. For one, it could use the <audio> element instead of forcing me to download the sound file.

yorwba · on Jan 30, 2017

This doesn't seem to use espeak, since while the --ipa flag can make espeak output IPA characters, it can't take IPA input (or I don't know the correct incantation).