Hacker News new | past | comments | ask | show | jobs | submit login

I haven't read the paper yet, but I'm about to. Before I do, I want to ask: do people usually implement O(n^2) distance calculations in k-means when trying to find the closest centroid that each point is to? I think that's what I usually see, which seems so dumb to me. Are metric trees that obscure?




That would be interesting to investigate but you have to keep in mind that the centroids move between each iterations. So you have to re-build a new metric tree at each iteration. kd-tree / build time complexity are between `n log(n)` and `n (log(n))^2`. You don't benefit from the classifical speed-up of metric trees where you build the tree once and then do many queries on the same tree.

Also metric trees such as k-d tree and ball tree do no bring any speed-up in high dimensions.


>Also metric trees such as k-d tree and ball tree do no bring any speed-up in high dimensions.

This is not true. Only their naive implementations don't bring speedups in high dimensions and it depends on the number of data points, query points, and dimensions. Best bin first is one common optimization.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: