Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Machine learning cheat sheet (eferm.com)
166 points by Emore on May 4, 2011 | hide | past | favorite | 13 comments



In case you see the cheat sheet and think, "Wow, I'd love to understand that," there's an excellent (albeit challenging) complete course on machine learning in Stanford's "engineering everywhere" online repository. http://see.stanford.edu/see/courseinfo.aspx?coll=348ca38a-3a...


Another option is "Programming Collective Intelligence," by Toby Segaran. I read through it recently on a long flight to Australia. It's one of the most straight-forward AI books out there, presenting most of these algorithms in just a few pages with nice sample Python code and diagrams. A perfect intro/refresher, and it takes a web developer perspective on these techniques.

Since reading it I've noticed how many friends have it on their bookshelves.

Here's a link: http://oreilly.com/catalog/9780596529321


I haven't read the COIN book, but if you want to get aggressive you can go for "Elements of Statistical Learning".

Free pdf download, probably not a one-flight book:

http://www-stat.stanford.edu/~tibs/ElemStatLearn/

side note: Nat, did you intern at SGI in the late 90s, as the self-titled "armchair programmer of the apocalypse"?


While it does a great job of explaining many AI concepts in an unintimidating fashion, the Python code in it is rather buggy. On the balance, I'd still recommend it as an intro.

The errata page: http://oreilly.com/catalog/errataunconfirmed.csp?isbn=978059...


All the algorithms requiring training can be optimized using stochastic gradient descent-- which is very effective for large data sets (see http://leon.bottou.org/research/stochastic)

Also, here are some additions for the online learning column:

* Online SVM: http://www.springerlink.com/index/Y8666K76P6R5L467.pdf

* Online gaussian mixture estimation: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.87....

One more thing: why no random forests? Or decision tree ensembles of any sort?


Thanks for the comments!

The course unfortunately couldn't cover all material on all algorithms, so the cheat sheet basically reflects my own knowledge rather than what's possible. I've referenced the Online SVM and Online Mixture model though, thanks for those.

Also, I'll have to look into stochastic gradient descent!


KNN "no learning involved": one probaby wants to cross-validate K at the least, if not learn the metric.

Some methods say online learning isn't applicable. As pointed out elsewhere, objectives for K-means and mixture models could be fitted with stochastic gradient descent. In general there is always an online option. For example, keep a restricted set of items and chuck out ones that seem less useful as others come in.

(Aside: I have a very introductory lecture to machine learning on the web: http://videolectures.net/bootcamp2010_murray_iml/ — not for anyone that knows the methods on this cheat sheat!)


Thanks for the comments!

Good point about using cross-validation to learn K, I forgot about that. I added this to the cheat sheet.

Also regarding online learning methods, I was probably a bit quick to dismiss certain algorithms as not supporting online learning; in coursework we unfortunately didn't have time to delve into all aspects of all algorithms. I've rewritten the Online column as "To be added." for those online methods I'm not familiar with (yet). Someone else is, of course, free to fork it on Github: http://github.com/Emore/mlcheatsheet


Nice summary; I like the format as well. However, the title of the cheat sheet is misleading since (a) many of the algorithms listed can be used for non-linear classification and (b) some of them can be considered supervised learning, such as naive Bayes and perceptron since they're trained with sample inputs and expected outputs (supervisory signals).

Otherwise, this is awesome. Hopefully you will add to it, and make it available in web form.


Thanks for the feedback!

I've changed the title to "Algorithms for Supervised- and Unsupervised Learning", which is definitely more appropriate. Initially the cheat sheet only contained linear classifiers, hence the misleading title.


Fantastic work, I have an ML exam coming up and this should really help. If I'm honest its one of the subjects I've struggled with the most. It seems experts in the field while incredibly intelligent, have a hard time breaking the material down into structured and easily digestible pieces of information.


No idea what i'm looking at but it definitely looks cool.


I'm taking this class next semester, downloaded it so hopefully I'll understand it later and it will come in use. Thanks!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: