Huge, if true. Researchers included a link to (C++) source code: http://research...

ogrisel · on July 15, 2015

I agree it's useful although there also exists a more scalable approximate variant of classical k-Means and the new Yinyang k-Means (referenced in the paper), namely Mini-batch k-Means:

http://www.eecs.tufts.edu/~dsculley/papers/fastkmeans.pdf

This approximate method is implemented (at least) in sofia-ml and scikit-learn.

thecopy · on July 15, 2015

Another approach is to transform the dataset into a smaller but representative dataset, called a core-set, and running k-means on that core-set instead which produces a "(1+e )-approximation for the optimal cluster centers".

See http://people.csail.mit.edu/dannyf/kmeancoreset.pdf