Hacker News new | past | comments | ask | show | jobs | submit login

I think the biases are pretty obvious, but the most serious shortcoming of this benchmark is that their result (30% WER on CV) is not reproducible: it's not clear what they trained their model on, and the model itself is not available, so you just have to take their word for it.



Thanks for the comment. Just wanted to quickly clarify that the model is available here: https://github.com/Picovoice/stt-benchmark/tree/master/resou...


No, "reproducible" means that I can train it on the same data you used, and get the claimed result.

Anything other than that is taking your word for it.


As someone who has no idea about ML, why is knowing how it was trained important?

I would guess to anticipate possible problems, but I don't really know.


To make sure it wasn't trained on the test set, even if by accident.


Kinda like creating your acceptance test using the same scripts as your unit test... the result would look good, but in fact be less than reliable.


More than kind of like; you nailed it: exactly like. This has been an infamous issue with machine learning for decades, where unwary researchers/developers can do this quite accidentally if they're not careful.

The thing is that training data is very often hard to come by due to monetary or other cost, so it's extremely tempting to share some data between training and testing -- yet it's a cardinal sin for the reason you said.

Historically there have been a number of cases where the best machine learning packages (OCR, speech recognition, and more) have been best because of the quality of their data (including separating training from test) more than because of the underlying algorithms themselves.

Anyway it's important to note that developers can and do fall into this trap naively, not only when they're trying to cheat -- they can have good intentions and still get it wrong.

Therefore discussions of methodology, as above, are pretty much always in order, and not inherently some kind of challenge to the honesty and honor of the devs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: