I think the biases are pretty obvious, but the most serious shortcoming of this ...

kenarsa · on Aug 7, 2018

Thanks for the comment. Just wanted to quickly clarify that the model is available here: https://github.com/Picovoice/stt-benchmark/tree/master/resou...

p1esk · on Aug 7, 2018

No, "reproducible" means that I can train it on the same data you used, and get the claimed result.

Anything other than that is taking your word for it.

neikos · on Aug 7, 2018

As someone who has no idea about ML, why is knowing how it was trained important?

I would guess to anticipate possible problems, but I don't really know.

lern_too_spel · on Aug 7, 2018

To make sure it wasn't trained on the test set, even if by accident.

canada_dry · on Aug 7, 2018

Kinda like creating your acceptance test using the same scripts as your unit test... the result would look good, but in fact be less than reliable.

DougMerritt · on Aug 7, 2018

More than kind of like; you nailed it: exactly like. This has been an infamous issue with machine learning for decades, where unwary researchers/developers can do this quite accidentally if they're not careful.

The thing is that training data is very often hard to come by due to monetary or other cost, so it's extremely tempting to share some data between training and testing -- yet it's a cardinal sin for the reason you said.

Historically there have been a number of cases where the best machine learning packages (OCR, speech recognition, and more) have been best because of the quality of their data (including separating training from test) more than because of the underlying algorithms themselves.

Anyway it's important to note that developers can and do fall into this trap naively, not only when they're trying to cheat -- they can have good intentions and still get it wrong.

Therefore discussions of methodology, as above, are pretty much always in order, and not inherently some kind of challenge to the honesty and honor of the devs.