Well, I think that is important to remember that dimensions of word2vec DO NOT have any specific meaning (unlike Extraversion etc in Big Five). All of it is "up to a rotation". Using it looks clunky at best. To be fair, I may be biased as I wrote a different intro to word2vec (http://p.migdal.pl/2017/01/06/king-man-woman-queen-why.html).
For implementation, I am surprised it leaves out https://adoni.github.io/2017/11/08/word2vec-pytorch/. There are many other, including in NumPy and TF, but I find the PyTorch one the most straightforward and didactic, by a large margin.
For some context, the "up to a rotation" argument is something that has gone on for decades in the psychological measurement literature.
This is true, but the clustering of points in space is not. So while the choice of axes is arbitrary, it becomes nonarbitrary if you're trying to choose the axes in such a way as to represent the clustering of points. This is why you end up with different rotations in factor analysis, because of different definitions of how to best represent clusterings.
I think there's some ties here to compressed sensing but that's getting a little tangential. My main point is that while it's true that the default word2vec embedding may lack meaning, if you define "meaning" in some way (even if in terms of information loss) you can rotate it to a meaningful embedding.
Well, sort of. They do have a meaning. It’s probably not an easily findable or understandable concept to humans. If you hypothetically had a large labeled corpus for a bunch of different features, you could create linear regressions over the embedding space to find vectors that do represent exactly (perhaps not uniquely) the meaning you’re looking for... and from that you could imagine a function that transforms the existing embedding space into an organized one with meaning.
No, it is not true. Everything is up to an orthogonal rotation. It is not an SVD (though, even for SVD, usually only the first few dimensions have a human interpretation).
Instead, you can:
- rotate it with SVD (works really well, when working on a subset of words)
- project it on given axes (e.g. "woman - man" and "king - man")
you could still interchange the dimensions arbitrarily. You can't say "dimension 1 = happiness", a re-training would not replicate that, and would not necessarily produce a dimension for "happiness" at all.
I’m not saying that. I’m saying you could identify a linear combination of x,y,z that approximates happiness, and by doing this for many concepts, transform the matrix into an ordered state where each dimension on its own is a labeled concept.
People are quick to claim that embedding dimensions have no meaning, but if that is your goal, and your embedding space is good, you’re not terribly far from getting there.
For implementation, I am surprised it leaves out https://adoni.github.io/2017/11/08/word2vec-pytorch/. There are many other, including in NumPy and TF, but I find the PyTorch one the most straightforward and didactic, by a large margin.