Some of this is that underlying model is insufficiently trained, but some of thi...

pyduan · on Nov 13, 2013

I agree disambiguation is a tough problem, but I'm curious to see whether the vector representation learned by word2vec can help in that regard. Have you considered using clustering to attach a few possible cluster memberships to each word (which would hopefully each capture information about a different usage context), then selecting the meanings for each word that give the maximum overlap?

The idea being that the words used to search on your website are likely to be conceptually similar (one doesn't add apples to oranges), so you can assume that "queen (sovereign) + man (noun) - woman" makes more sense that "queen (band) + man (unix command) - woman" because "man (noun)" and "woman" are likely to belong to similar clusters and not "man (unix command)".

minikomi · on Nov 13, 2013

Japan - Tokyo + History = Kyoto was what I was expecting but I guess the corpus isn't quite there yet. This was the answer .. http://en.wikipedia.org/wiki/Atpara_Upazila At least it's geographic!

mjfl · on Nov 13, 2013

Hitler -German +Italian = Sofiene Zaaboub apparently. Was hoping for Mussolini.

riffraff · on Nov 12, 2013

maybe just let the user choose the meaning from a drop down list? You can (probably) integrate freebase' suggest easily and use the FB api to get to the WP id.