Hacker News new | past | comments | ask | show | jobs | submit login

Also you hear that example over and over again because you can't get other ones to work reliably with Word2Vec; you'd have thought you could train a good classifier for color words or nouns or something like that if it worked but actually you can't.

Because it could not tell the difference between word senses I think Word2Vec introduced as many false positives as true positive, BERT was the revolution we needed.

I use similar embedding models for classification and it is great to see improvements in this space.




The other example that worked for me with Word2Vec was Germany + Paris - France = Berlin: https://simonwillison.net/2023/Oct/23/embeddings/#exploring-...


There are a bunch of these things in a word2vec space. I had a blog post years ago on my group's blog which trained word2vec on a bunch of wikias so we could find out who is the Han Solo of Doctor Who (which I think somewhat inexplicably was Rory Williams). You need to carefully implement word2vec, and then the similarity search, but there are plenty of vaguely interesting things in there once you do.


It's a good point about true and false positives though, which makes me wonder if anyone's taken a large database of expected outputs from such "equations" and used it to calculate validation scores for different models in terms of precision and recall.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: