For some context, the "up to a rotation" argument is something that has gone on for decades in the psychological measurement literature.
This is true, but the clustering of points in space is not. So while the choice of axes is arbitrary, it becomes nonarbitrary if you're trying to choose the axes in such a way as to represent the clustering of points. This is why you end up with different rotations in factor analysis, because of different definitions of how to best represent clusterings.
I think there's some ties here to compressed sensing but that's getting a little tangential. My main point is that while it's true that the default word2vec embedding may lack meaning, if you define "meaning" in some way (even if in terms of information loss) you can rotate it to a meaningful embedding.
This is true, but the clustering of points in space is not. So while the choice of axes is arbitrary, it becomes nonarbitrary if you're trying to choose the axes in such a way as to represent the clustering of points. This is why you end up with different rotations in factor analysis, because of different definitions of how to best represent clusterings.
I think there's some ties here to compressed sensing but that's getting a little tangential. My main point is that while it's true that the default word2vec embedding may lack meaning, if you define "meaning" in some way (even if in terms of information loss) you can rotate it to a meaningful embedding.