Hacker News new | past | comments | ask | show | jobs | submit login

But Hinton recently made significant advances in speech recognition using RBMs.

Google Tech Talk: http://www.youtube.com/watch?v=VdIURAu1-aU




He didn't make any advance that has made its way to a full word recognizer, he's merely recognizing phonemes (which are linguistic subunits of words) several researchers in the field have criticized his methods. Additionally, none of the top five phoneme recognizers have ever been deployed as a word recognizer, and there is little chance that they even will be in the next few years.


The concept of phonemes isn't undisputed either. When analyzing actual speech it becomes clear that there are no real steady states, but much coarticulation between the "segments". Of course, part of it could be attributed to the fact that speech sounds are produced by articulatory gestures, which necessarily overlap in time. On the other hand, these coarticulation patterns are not language-independent. So, a purely (articulatory/auditory) phonetical explanation of why these differences exists is rather unlikely.. I know this seems rather off-topic with regard to speech recognition, but the question of the basic building blocks of language is kind of at the heart of the problem.


I agree that its at the heart of it (and I'm presently writing a paper where I'm using articulatory-phonetic features rather than phonemes). Unfortunately, there is no large-vocabulary speech recognizer that uses articulatory phonetics (yet!). Every large scale speech recognizer and most small scale use phonemes and are trained using speech that has been transcribed into phonemes. There is almost no data that is annotated with articulatory phonetics (a problem I'm working on right now).


I guess that's in part because it's even more difficult to (manually) transcribe speech into articulatory-phonetic elements based on the acoustic signal (laryngeal gestures?? Clearly they are there in articulation, but their acoustic correlates are masked to some extent).

Automatic alignment methods are probably quite hard to implement, given the various coarticulation patterns in the signal depending on context/prosodic position etc.

Could you provide a link to papers or other materials dealing with articulatory features in speech recognition?

I guess I should take another look at Browman/Goldstein's Articulatory Phonology


Nice post. Question- Is there a relation between RBM and HMM?


Kind of. Both RBMs and HMMs are a mean to squeeze statistics out of something so that you can take the something away and still know what it looked like. RBMs are a bit more involved (and hence a whole lot slower, which still matters in speech recognition), while HMMs are simpler but good enough for people to stick to them (frustratingly for all the people who propose something fancier)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: