Hacker News new | past | comments | ask | show | jobs | submit login
Dense Vectors: Capturing Meaning with Code (pinecone.io)
40 points by gk1 on Oct 19, 2021 | hide | past | favorite | 4 comments



Am I nitpicking when I say that all techniques mentioned in this article don't capture meaning but rather capture similarity? In particular, the article never mentions grammar at all. Is this because of limitations in current computing or is the fundamental assumption that grammar is redundant?

When we roll out speech-based interfaces with these limitations, do we train our children to ignore grammar?


These models capture similarity on such a scale that they manage to learn patterns of language well, in terms of grammar they learn it indirectly - it is not manually coded, particularly the later models that we speak about. Our focus here is more on encoding text (and images) in a way that they can be compared to other text or image based on their meaning/imagery rather than syntax

But the later transformer models have a good grasp of grammar - similar to when we learn to speak as children, it isn't necessary to be taught grammar directly, but we can figure it out over time based on the patterns of language.


> But the later transformer models have a good grasp of grammar

AFAIK that is an open question. They don't even have a grasp of negation beyond superficial heuristics, as seen in arXiv:1902.01007.


Word embedding especially word2vec are quite primitive as you say. To capture grammar and more complex meaning neural language models are much better.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: