Hacker News new | past | comments | ask | show | jobs | submit login

Using WordNet used to be a very popular way to perform "knowledge-rich" NLP around late 90s upto around 2010 (approximate timeline). "Knowledge-rich" meant you could start with some understanding of the language and not rely solely on the data at hand. Much like the use-case that pretrained models like GloVe serve today (WordNet probably is closer to Dependency based word vectors [1]). Some interesting uses were query expansion [2], sense disambiguation [3], word similarities (popular: wu-Palmer similarity, check out NLTK), and in an interesting area called "lexical chains" [4]: group of related words running through a text, with their "weave" signifying topics.

The arrival of WordNet on the scene, when it happened, was a big deal, since there weren't many ways to perform knowledge-rich NLP back then. The common ones were using a dictionary or a thesaurus. There was some effort to tie topic models with WordNet too, like LDAWN [5]. And extending it, based on collocation information you could glean from the gloss - "eXtended WordNet" [6].

You still (occasionally) see its uses where you need some kind of rich prior knowledge. For ex, the "Hierarchical Probabilistic Neural Network Language Model" by Morin and Bengio [7], or cluster labeling (which uses embeddings with WordNet) [8]. To quote an example from the latter, 'a word cluster containing words "dog" and "wolf" should not be labeled with either word, but as "canids"'. And you know "canids" is a super-category here, by looking up the precise relationships in WordNet.

My own Master's research looked at combining WordNet based lexical chaining with more "ML"-ish techniques like Hidden Markov Models [9]. Which is why I know, or rather, vaguely remember, some of the stuff that was happening back then :-)

I think the primary reason why WordNet did not retain its popularity was it was a good "one off" solution. Worked well with "correct" English. You want to adapt it to your domain vocabulary? Heuristics. You want to use WordNet in another language? Well, someone needs to build one first. You want to use it to process text in internet lingo? Nope, hybrid models and heuristics. Also, at this time the amount of text available to train on was increasing by leaps and bounds, so the field moving toward ML heavy techniques made sense.

[1] https://www.aclweb.org/anthology/P14-2050.pdf

[2] https://www.aclweb.org/anthology/P08-1017.pdf

[3] https://pdfs.semanticscholar.org/7f2c/b3e390c5e539ef9089014a...

[4] http://www.cs.columbia.edu/nlp/papers/2003/galley_mckeown_03...

[5] https://wordnet.cs.princeton.edu/papers/jbg-EMNLP07.pdf

[6] https://en.wikipedia.org/wiki/EXtended_WordNet

[7] https://www.iro.umontreal.ca/~lisa/pointeurs/hierarchical-nn...

[8] https://www.aclweb.org/anthology/U18-1008/

[9] https://pdfs.semanticscholar.org/e7ce/34e5acdbb7a91e28fdafa9...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: