Hacker News new | past | comments | ask | show | jobs | submit login
Speak IPA – Text to Speech for International Phonetic Alphabet (bearbin.net)
102 points by bearbin on Jan 29, 2017 | hide | past | favorite | 40 comments



The IPA itself isn't really sufficient to encode the sound of a word: you also need to know what language is being spoken; there is a little book you can get of vowel charts for all of the major languages in the world that tries to document the exact vocal position of each symbol. The issue is that there's a continuum of sound that can be generated by the human vocal system, and while humans don't want to differentiate between subtle variations while trying to talk to each other (and so you wouldn't expect two words with massively different meanings based on some extremely subtle and difficult to hear difference), they can notice the difference (which is part of what causes people to have "accents" based on their native languages: because they are pronouncing vowels in some slightly different part of their mouth). IPA only represents the major possibilities, not the specific variants (though there are extended and increasingly complex symbols to try to provide some more accuracy, it is again a continuum and not something you can represent with a small and finite set of symbols per vowel: you'd have to start putting a couple numbers down to represent the sound ;P).

Sadly, I'm having a very difficult time finding a good reference for this online. I know all of this because I spent years studying graduate Linguistics at UCSB while I was trying to get a PhD in Computer Science, and I carried around that little book of per-language vowel charts for a long time ;P.

http://www.antimoon.com/how/english-vowel-chart.htm

> For example, the average British /æ/ is slightly more open (more like /a/) than the average American /æ/.

That said, I typed all of that under an expectation that it was going to work really well, but in practice this website sounds a lot like Dr. Sbaitso, and so the nuance of pronunciation is totally lost anyway ;P.


One minor nit: it's important to distinguish between phonemic and phonetic transcriptions in IPA. The former are what's usually found in dictionaries and does not account for variance in dialects and speakers. The latter attempts to represent utterances as they are actually produced by speakers.

There's still some nuance that is lost in transcription but phonetic IPA transcriptions can achieve a pretty close approximation to the real utterances.


This is true, but you make it sound like this is a response to and even a solution for my comment, when I was actually assuming a phonetic transcription: such transcriptions also vary in how narrowly they define the sound involved, and it is the nuance of sound to which I was commenting, as even in a very narrow phonetic transcription (with tons of marks to try to adjust the sounds of the phonemes), you just can't represent what a native speaker sounds like using these symbols without adjusting for language; and I guess we just have to disagree with how "pretty close" the result is, as when I thought about what I would want to use an IPA->speech tool to accomplish, they all involved vowel charts ;P.


> This is true, but you make it sound like this is a response to and even a solution for my comment

I think it's a suitable response. There is enough allophonic variation among speakers of a single language that undoubtedly someone perfectly reciting a narrow IPA transcription could be taken as a plausible native speaker. Even for your purposes (though I'm still not clear on what type of task your envisioning) the IPA could still be useful as an intermediary layer of abstraction, as in storing a mapping by language of IPA vowel symbols to the exact formants required.


Thanks for taking the time to critique this - it's only something I put together in a few hours for fun. I'm sure somebody with more skill than me would be able to make something like this for multiple languages / accents.


This could actually be quite useful for generating stimuli for artificial language learning experiments (used in psychology and linguistics) where you don't want to model longer-distance effects among speech sounds (avoiding correlated queues), and need a 1:1 correspondence between symbols and sounds. Do you have a command line version?

One other thought: you could compare the output with espeak (http://espeak.sourceforge.net/) or use espeak to generate IPA transcriptions for various languages.


This is really cool! Thanks for sharing with us!

Is this using parametric synthesis? (It doesn't sound concatenative.) Do you have a background in speech, signal processing, or audio, and is this just a passing interest, or something you want to continue to explore?

I've been teaching myself speech algorithms and methods off and on for the past six months ago. Recently I developed a concatenative Donald Trump text to speech engine (I've posted about it in the past), but the samples aren't great and it doesn't use proper unit selection. I'm trying to apply ML to generate a massive set of smooth n-phones that concatenate well together.

I'd definitely like to exchange contact info if you're into speech synthesis long term. My info is in my profile.

In any case, really cool project! :)


It may not be sufficient but it is surely very useful when you have no idea how to pronounce the word. Besides, I think it works quite well for English. When I started learning it I had to rely on the IPA transcriptions every time I looked up new words in a dictionary. Even now, when online dictionaries have nice recordings of the vocabulary spoken by native speakers, I still use it for proper nouns(names etc.) that are not included in dictionaries when I can't find the recorded pronunciation e.g. on Youtube.


It only works for English. Any other language seems to get mangled into English-like syllables.


Are you talking about this project or the IPA generally?


This "project" specifically. It conflates ɾ and t and it ignores most non-english IPA letters.


>> "there is a little book you can get of vowel charts for all of the major languages in the world that tries to document the exact vocal position of each symbol."

What is the name of the book? Do you have a link to it on Amazon or the ISBN for the version you're recommending?


I am pretty sure the book I had was the "Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet".

https://www.amazon.com/dp/0521637511


For a comprehensive site proposing a more accurate representation than IPA, named canIPA, please see http://venus.unive.it/canipa/dokuwiki/doku.php?id=en:start.


Quite true. I guess you would need different "profiles" corresponding to various languages or major dialects so that each symbol is interpreted correctly.


It'd be very, very useful if you could F/OSS this, so that we could build something to help illustrate hard-to-pronounce Wikipedia articles, which currently have to be manually pronounced. There are thousands of these needed!


It's amazing for how useful IPA is (at least at providing an reasonable fidelity intermediate form between other, more-accurate representations) how few FOSS projects I've seen use it.

Really what I'm saying is I'd like to see someone build an automated pun-discovery tool.


This repo[1] contains IPA dictionaries for 17 languages, including English, Spanish, and Chinese in JSON/CSV/XML/plain text format.

If you really want to find puns programmatically, the releases section[2] has a ready-made package with homonyms in all the languages, including English. It should be trivial to make an online service that searches through this file for matches on particular words.

[1] https://github.com/open-dict-data/ipa-dict [2] https://github.com/open-dict-data/ipa-dict/releases


It would be interesting to learn the approach taken when developing this.

I sounds clear enough, but there are a few issues which would need to be resolved before I would use it. For example, when I entered "o:", which should be a long pure vowel, I got a diphthong. The first example [ˈnɑɹkoʊˌklɛpˈtɑkɹəsi] has the IPA for a General American accent but sounds more like Received Pronunciation to me. Some IPA characters aren't voiced at all. These might be fixable, but how easy this is depends on the implementation approach.

Decades ago, I developed a formant speech synthesizer (the details are here: http://web.onetel.com/~hibou/Formant%20Speech%20Synthesizer....). Formant speech synthesizers work by passing a pulsed or random input through a series of filters to generate speech sounds, and can be easily adapted to different accents and speakers. However, it is difficult to get them to sound natural, so they usually sound more like Daleks than people.

I've also done some rule-based text to speech. This works quite well for Standard English pronunciations in a Glasgow accent, the closest accent to English spelling and therefore the one which can be most reliably generated with the smallest number of exceptions.

More recent approaches to speech synthesis sound more natural but are limited to a particular accent and speaker. It's never a Glasgow accent, and developing one for a new speaker and accent is a major undertaking. Were I to switch accents to Received Pronunciation or General American, there would be many more exceptions to the pronunciation rules. Storing pronunciations in a dictionary only works for words stored in the dictionary.


As a voice actor, I am in LOVE with this idea, and have searched for this very thing a number of times. I'm really glad to see it created. Im constantly faced with saying words I have no idea how to pronounce correctly, and I spend far too much time trawling through youtube videos, or on forvo.com, and its ilk in the hopes of finding the word correctly pronounced.

I wonder, though, if it's a shortcoming of IPA that the generated pronunciations are not what I'd expect. Example, my hometown of Annapolis is here [1] by this tool: compared to how it's actually pronounced [2]

[1]https://speak-ipa.bearbin.net/speak.cgi?speak=%C9%99%CB%88n%... [2]https://youtu.be/1I71yL3SG80?t=11

As you can hear it's pretty far off, so much so I would be unable to rely on the computer generated version.


The transcription you used /əˈnæpəlᵻs/ contains /ᵻ/ which doesn't appear to be a commonly used IPA character (see https://en.wikipedia.org/wiki/Near-close_central_unrounded_v...), and presumably wasn't encoded and so was absent from the output. Try /əˈnæpəlɪs/, which though it might not capture all the nuances of the local accent, at least sounds intelligible.


That is helpful! Thank you! I had copied the notation from wikipedia.


This is the first time I'm hearing about IPA. I wanted to test out how IPA works so I searched for chinese to IPA converter and came across this site: http://easypronunciation.com/en/chinese-pinyin-phonetic-tran...

I put in a few chinese sentences, got the IPA, then pasted into your app and listened to the sentence in IPA. Although it wasn't very accurate, its one of the coolest things I learned this year. Thank you very much for sharing.

I think a really cool next step is to add the ability to type things in and get the IPA and pronunciation.


> I think a really cool next step is to add the ability to type things in and get the IPA and pronunciation.

http://espeak.sourceforge.net/ can already do this, however Chinese support is currently flaky at best when using characters, because different pronunciations are not disambiguated based on context. Giving it Pinyin to work with is enough to fool me non-native speaker, though.


I've been hoping to see something like this for quite some time. Kudos for a really cool project and for making it available for people to play with.

At least for Spanish, it can pronounce some words fairly well. I wrote a Spanish orthography-to-IPA converter a few years back. It's up on Heroku until folks crash it if you want to get some Spanish words transcribed to IPA.

http://spanish-demo.herokuapp.com/

I used it to generate a few random words. Some of the sounds were off - for example ɾ (the "r" in "estar"). But many words were pronounced clearly enough to be understood.


Some time ago I've found out that the macOS say command does support IPA. However not all word seem to work.


I always wanted to have a text-to-speech system that could generate novel accents, by taking just a couple parts of the IPA at a time and substituting other sounds.

Text-to-IPA might be easy enough with dictionaries, the swapping is trivial, but IPA-to-speech seems like a harder problem.


Finally some closure: Worcestershire sauce is "wʊstərʃər" according to https://en.wikipedia.org/wiki/Worcestershire_sauce


In Received Pronunciation /r/ is only pronounced before vowels. Otherwise, it usually modifies the previous vowel. "wʊstəʃə" is a better phonetic transcription.


Or even "wʊstʃə".



Didn't know about IPA before seeing this, thanks for sharing.

Saying "hacker news" in IPA using your tool: https://speak-ipa.bearbin.net/speak.cgi?speak=%27h%C3%A6k%C9...


I checked a few examples from my native language (Polish) and they sound amusing at best.


Hey cool! I finally have somewhere on the web that will make audio for my markov syllable-synthesis engine!

http://blurtmime.appspot.com/


Doesn't seem to work in Safari, probably because the audio is a wav.


If you mean you get prompted to download speaky.cgi, that happens in Firefox/Windows too.


This is great, but I have to use Chrome to use it. I'd prefer to use Safari.

Do you plan to let Wikipedia use this? It would be really useful on their site.


Back in the 80s, a friend of mine had a text-to-speech program running on a Z80 machine (I think), which sounded a lot like this.


This appears to be just a web wrapper around http://espeak.sourceforge.net/, and not a very slick one either. For one, it could use the <audio> element instead of forcing me to download the sound file.


This doesn't seem to use espeak, since while the --ipa flag can make espeak output IPA characters, it can't take IPA input (or I don't know the correct incantation).




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: