I tried with clustering similar embeddings but it did extremely poorly (~0%) since the groupings are often deceiving with words in a group only having one small way in which they're connected and lots of spurious fake groups to throw you off. Maybe looking for groups with high similarity on only a sibset of embedding dimensions might help, but I didn't have much time to play either :) A notebook to get you going if you do want to play: https://colab.research.google.com/drive/1KJeSB9Q5XzSeT9ONUJ_...