Some of this is that underlying model is insufficiently trained, but some of this is disambiguation. Disambiguation in text is a very, very, hard problem.
I agree disambiguation is a tough problem, but I'm curious to see whether the vector representation learned by word2vec can help in that regard. Have you considered using clustering to attach a few possible cluster memberships to each word (which would hopefully each capture information about a different usage context), then selecting the meanings for each word that give the maximum overlap?
The idea being that the words used to search on your website are likely to be conceptually similar (one doesn't add apples to oranges), so you can assume that "queen (sovereign) + man (noun) - woman" makes more sense that "queen (band) + man (unix command) - woman" because "man (noun)" and "woman" are likely to belong to similar clusters and not "man (unix command)".
Japan - Tokyo + History = Kyoto was what I was expecting but I guess the corpus isn't quite there yet. This was the answer .. http://en.wikipedia.org/wiki/Atpara_Upazila At least it's geographic!
maybe just let the user choose the meaning from a drop down list?
You can (probably) integrate freebase' suggest easily and use the FB api to get to the WP id.
So some of your examples, when clarified, are a bit clearer: Paul McCartney - Beatles + Rolling Stones = Mick Jagger is in the 3rd spot. (http://www.thisplusthat.me/search/Paul%20McCartney%20-%20Bea...) Change stone -> Rolling Stones.
Thank you for the comment!