Hacker News new | past | comments | ask | show | jobs | submit login

>If I live in London and the distance from me to Paris is 50.000 km, and you live in Rome, 70.000 km from Paris, that does not give me enough to calculate the distance between Rome and London.

Yes, but you do know that the distance from London to Rome is less than 12,000 km, because of the triangle inequality axiom. So if that's too far, that search path is discarded.

Then we just start the process over again, by comparing each subtree back to the original location.




Thanks, The idea behind the BK-tree is ingenious.

I'm struggling with finding a use case for that data structure through. Why would you construct a BK-tree that would only become powerful when it contains millions of words, which would then create a nuisance when representing that amount of data in memory, making it not so fast anymore, when you could represent the same data in a compressed form and with the same (as well as an extended set of) querying capabilities?

Perhaps BK-trees are for big machines with powerful CPUs? I'm sure there is a setup that would make that tree in fact better than any other tree.


I don't think the best use case for BK-trees is spell-checking and words. The area in which they are used most successfully is image deduplication. In that case the metric you're going to use is some form of perceptual hashing.


I think you are missing important case such as image data. Another one is floating point vectors with scaled up and rounded distance.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: