I haven't read the paper yet, but I'm about to. Before I do, I want to ask: do p...

pornel · on July 15, 2015

I use vantage point trees for finding the closest point:

https://github.com/pornel/vpsearch (https://github.com/pornel/pngquant/blob/master/lib/nearest.c)

ogrisel · on July 15, 2015

That would be interesting to investigate but you have to keep in mind that the centroids move between each iterations. So you have to re-build a new metric tree at each iteration. kd-tree / build time complexity are between `n log(n)` and `n (log(n))^2`. You don't benefit from the classifical speed-up of metric trees where you build the tree once and then do many queries on the same tree.

Also metric trees such as k-d tree and ball tree do no bring any speed-up in high dimensions.

CyberDildonics · on July 15, 2015

>Also metric trees such as k-d tree and ball tree do no bring any speed-up in high dimensions.

This is not true. Only their naive implementations don't bring speedups in high dimensions and it depends on the number of data points, query points, and dimensions. Best bin first is one common optimization.