Hacker News new | past | comments | ask | show | jobs | submit login
Conversational Speech Recognition System [pdf] (microsoft.com)
82 points by dsr12 on Aug 21, 2017 | hide | past | favorite | 11 comments



So, for people that aren't aware:

There are a couple big shops that publish papers like this every year. The sole reason they do so is so that they can tell their managers and potential professional services customers that they have "the best ASR system". It's BS.

IBM and Microsoft are among the most guilty.

If you're seriously interested in ASR state-of-the-art performance, "systems" papers like this are largely nonsense. Usually they spend most of their time tuning hyperparameters to small publicly available datasets, to the point that the model will often not generalize well to real world settings because of how aggressively they've overfit the benchmark set.


In a similar vein for IBM are releases claiming to have simulated a whole cat brain [1] and many claims about Watson [2]. In general, any claim that IBM has invented or done something useful, nowadays, should be taken with a pinch of salt. It's rarely anywhere near as good or complete as claimed.

[1] http://www.lovemeow.com/ibm-cat-brain-simulation-a-hoax-1607... [2] https://www.linkedin.com/pulse/people-have-been-asking-why-i...


I do find it strange that their test WER is uniformly lower than their devset WER. I would expect validating on the devset to lead to the opposite effect, where the model overfits to the devset and performance degrades during test.

Or do you mean by "tuning hyperparameters to small publicly available datasets" that the datasets do not have enough real-world variability and can be fit too easily? Are there no large, realistic datasets or are those just not available to the public?


Even though they'll use the dev set to tune hyperparameters, they'll end up making a "is this a good model for the paper" decision based on the test set. So, e.g., they'll try different architectures, tune hyperparameters on the dev set, then decide whether to use that architecture by looking at the test set. So that's part of it. AKA "gradient descent via grad student"

The other part is, as you say, relatively constrained data sets.


We’re so close to voice recognition being perfected but we can’t seem get over the final hurdles for everyone to use it.

Is it simply a matter of collecting more data?

Google is collecting more voice data in Docs.

http://www.pcworld.com/article/3038200/data-center-cloud/how...

And Mozilla is doing common voice:

https://voice.mozilla.org


I dislike the way Google collects AI training data for reCaptcha to the extent I often will not bother rather than log in to sites like Stack Overflow that use it. Common voice sounds like a much more open approach to which I would actually be interested in helping.

From this paper Microsoft are hitting similar accuracy to professional transcribers, so it may not be accuracy that's holding back adoption.

There also seems to be a trend away from voice interaction these days, I send messages far more than I talk to people by phone and I don't think I'm alone in preferring this in many cases. Having automatic transcription of voice messages might be useful, but I suspect privacy issues would prevent that becoming widespread without more trust in the companies providing these services.


I send more text messages these days too. However, I try to dictate them. Basic editing by voice is definitely needed.


There are many kinds of voice recognition software. The ones that give state of the are accuracy are too expensive to run at internet scale. Maybe Google can do it with their specialized TPUs that run inference cheaper than GPUs.


This is from oct 2016. Previous discussion https://news.ycombinator.com/item?id=12736409

Are you sure you didn't mean to link to this one https://www.microsoft.com/en-us/research/wp-content/uploads/...

It just came out and is an update of the original 2016 system



that was very kind unlikelymordant!




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: