Context Is Everything: Finding Meaning Statistically in Semantic Spaces

Radim · on April 28, 2018

The article seems a little rough (draft?), but the money shot is the algo in Section 5. Looks pretty neat and straightforward!

TL;DR:

1. Build a special distance(word1, word2) metric using a set of word vectors trained elsewhere (such as GloVe). This distance works better than cosine distance.

2. Given a document (a sentence, a paragraph… basically, a sequence of words), calculate the "importance" of each word as a sigmoid over distances(word, avg(all_words)).

3. To embed a document, simply do a weighted average of its word vectors, where the weight of each word equals the importance above.

ContextArxiv · on April 28, 2018

Hi, author here! I can confirm everything you said. A conference version is in the works, though this one is essentially complete

syllogism · on April 28, 2018

A small suggestion: I think the way you're describing this as "context" might be misleading. You'll probably want to try the same thing on the output of a BiLSTM or CNN model instead of just the word vectors. I think this is more like an attention mechanism --- like tf-idf, you're learning to downweight words which are less informative. Your method is also very similar in effect to the LexRank line of work, which is more computationally expensive and complicated.

ContextArxiv · on April 28, 2018

Hi! Can you clarify? This is a fairly standard use of "context" (http://www.socher.org/index.php/Main/ImprovingWordRepresenta...). While the algorithm can be used with just the covariance of a corpus's word vectors, accounting for the distribution of a context when setting up the distance metric acts to improve performance substantially (https://i.imgur.com/Dnx9c9G.jpg)

SteveJS · on April 28, 2018

‘Attention is all you need’, was the first thought i had when i read the title of your paper. It might be worthwhile to have a small bit explaining the distinction between attention and context as you view them. If i understand correctly you are describing how on to build a highly useful context representation. The application of that context in a neural net would be to focus the attention of the neural network on the parts of the input that are more important. From my point of view your paper here helps answer the question: what directs the attention?

Apologies if all of this is blindingly obvious. I’m a fascinated amateur.

ContextArxiv · on April 28, 2018

Yep, exactly. The current paper presents the contextual model and how to interact with it, and then shows that even simple models (such as the weighted bag of words in the first parent comment) based on it do quite well. I really appreciate the interest and please let me know if you have any questions! :)

syllogism · on April 28, 2018

I just think that titling a paper about a weighted bag-of-words approach "Context is everything" really starts the reader off on the wrong foot. In fact there's almost no context in your model, in contrast to most other sentence learning methods!

ContextArxiv · on April 28, 2018

To be clear, the weighted bag-of-words is a powerful application of the contextual salience measurement, CoSal, but is certainly not the only one.

(In practice, this can also be applied outside of language, for the importance of other semantic vectors in context)

whymauri · on April 29, 2018

Was this a 224n project? I'm hoping to try doing the class online on my own. Do you think the 2017 videos are still pretty in-line with the topics on the current website?

ContextArxiv · on April 30, 2018

Hi! This started off as a 224n project, but it's gone through quite a bit of work since then. The class slides are from the last class are pretty complete. I think they put a lot of time in to keep it updated, especially towards the end

braindead_in · on April 28, 2018

Do you plan to release an implementation? I would definitely like to play around with it.

ContextArxiv · on April 28, 2018

Absolutely. The current priority is conference submission, but releasing a simple interface is definitely the plan, as appropriate

ContextArxiv · on April 28, 2018

https://github.com/ezelikman/Context-Is-Everything The code will be posted here when available!

auvrw · on April 28, 2018

here's an impl of something like (but actually probably nothing like) 1. in the above tl;dr

(disclaimer: i dr)

https://github.com/ransomw/botiful-soup/blob/master/mibot/sr...

as i recall, there's a lot of neato REPL-y graphvis to try in

https://github.com/ransomw/botiful-soup/tree/master/mibot/de...

----

while i'm here, let me mention: Gärdenfors books like "Geometry of Meaning" are hearty food for thought about natural language semantics

sizzle · on April 28, 2018

Very cool. Any interesting uses for this? I want to develop a 'cognitive debiasing' ML system. It will take real time spoken languages and parse it syntactically to output a dibiased version of the input. Let me know if it have any resources or insight on this. Thanks!

gradys · on April 28, 2018

It sounds like you want to be able to generate language with the same basic meaning as the original but without whatever bias was there.

I'd recommend starting by trying to detect the presence of the bias first. Language generation is a very difficult problem in general and this is an especially difficult instance of it. Even detecting bias will be challenging because of how subtle a phenomenon it is, but it's much more tractable than rewriting what was said to not include the detected bias.

3131s · on April 28, 2018

By "debiasing" do you mean word sense disambiguation?

sizzle · on April 28, 2018

Not sure what the technical term is but by debias I meant the ability to detect cognitive biases [1] from real time spoken input and remove them from the output. The goal would be to help humans understand their biased thinking and over time make us less biased in our decision making process.

[1] https://en.m.wikipedia.org/wiki/List_of_cognitive_biases

rhizome · on April 28, 2018

Good luck!

sizzle · on April 28, 2018

Thanks! Even if I get could a few smart individuals to put together a 'bias detector' mvp that would be an awesome portfolio project!