Hacker News new | past | comments | ask | show | jobs | submit login
Do We Need Hundreds of Classifiers to Solve Real World Classification Problems? [pdf] (jmlr.org)
36 points by alexcasalboni on May 21, 2015 | hide | past | favorite | 7 comments



Random Forests are great on many tasks, but this analysis is incredibly biased: it only includes the incredibly small and simple datasets in the UCI repository. Many real world tasks are far more complex than that, especially those involving text, speech, images, video, and large scale web data.


This is interesting experimental evidence in spite of the NFL theorem, which refutes the notion of a generally superior algorithm.

https://en.wikipedia.org/wiki/No_free_lunch_theorem

I would reconcile it by saying that the UCI contains a biased subset of all theoretical classification tasks.


This is not surprising to anyone, because the no free lunch theorem assumes that all datasets are randomly generated. In reality real world problems are probably drawn from some distribution, and datasets are not totally random.


It makes its point well, but I'd like to see a followup paper addressing neural networks: given the extreme complexity of successful deep neural networks, which outperform anything he considers there on real world problems he doesn't consider, what implications can we draw?


Neural networks tend to overfit very easily, which is the main reason other methods usually outperform them. They are mainly successful where they can exploit the structure of a problem, in ways other methods can't.

E.g. convolutional neural networks take advantage of local structure within images and the fact nearby pixels are related to each other.

However I'd really like to see the reemergence of Bayesian neural networks, which can solve the overfitting problem. Also methods like dropout are relatively new, and alleviate overfitting a lot more than was possible in the past.


So basically to get 93%(in average) of the value of machine learning, you can use bigml's extremely easy interface[1], even without writing code ?

[1]http://blog.bigml.com/2013/07/01/you-dont-need-coursera-to-g...


Do we need hundreds of reposts?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: