Hacker News new | past | comments | ask | show | jobs | submit login

Huge, if true.

Researchers included a link to (C++) source code: http://research.csc.ncsu.edu/nc-caps/yykmeans.tar.bz2




I agree it's useful although there also exists a more scalable approximate variant of classical k-Means and the new Yinyang k-Means (referenced in the paper), namely Mini-batch k-Means:

http://www.eecs.tufts.edu/~dsculley/papers/fastkmeans.pdf

This approximate method is implemented (at least) in sofia-ml and scikit-learn.


Another approach is to transform the dataset into a smaller but representative dataset, called a core-set, and running k-means on that core-set instead which produces a "(1+e )-approximation for the optimal cluster centers".

See http://people.csail.mit.edu/dannyf/kmeancoreset.pdf




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: