What are you running this on and how fast do you need it to be? You can brute force nearest-neighbours on a GPU pretty quickly (many times faster than approximate on a CPU).
I was able to do nearest-neighbour from each pixel in an image O(1e6) to a list of keypoint locations O(1e4) in about 50ms on an old GPU compared to minutes of CPU time.
It's a much smaller dataset, but still a simple way of getting a huge speedup over CPU alone. Also unlike Annoy or Flann, the results are guaranteed to be exact. (It's also incredibly simple to implement).
I was able to do nearest-neighbour from each pixel in an image O(1e6) to a list of keypoint locations O(1e4) in about 50ms on an old GPU compared to minutes of CPU time.
It's a much smaller dataset, but still a simple way of getting a huge speedup over CPU alone. Also unlike Annoy or Flann, the results are guaranteed to be exact. (It's also incredibly simple to implement).
See for example:
https://stackoverflow.com/questions/24020409/producing-a-nea...
EDIT: I saw below that you can split the strings by length which will shorten the time considerably.