Word2vec, LDA, and introducing a new hybrid algorithm: lda2vec

nl · on Jan 18, 2016

This looks really interesting, but it pretty hard to follow without the video.

My summary after a quick flick through is that it is a better classification(/clustering?) model for text, because it takes Word2Vec-style similarity into account, which plain LDA doesn't. That sounds like a reasonable approach to me, and nice to see someone get it working.

I think. Comments?

Here is the version with notes. I haven't read this through yet: http://www.slideshare.net/ChristopherMoody3/word2vec-lda-and...

Code here, BTW: https://github.com/cemoody/lda2vec

fnl · on Jan 18, 2016

Here's my attempt at an in-a-nutshell summary for those familiar with the underlying material. Warning: This might be complete nonsense!! Chris Moody proposes to replace the technique of summing paragraph vectors (to word vectors) with sparse "LDA-vectors"; Then he appends categorical variables (features) to these summed word+LDA vectors and estimates a multinomial mixture over the latent word topics. All this is applied to a conditional probability model to predict the final topic assignments ("topic vectors") for some set of pre-defined groupings of input documents. Finally, he claims the resulting, posterior topic assignments are even good enough to predict (supervised) outcomes. A more non-mathematical explanation: Imagine analyzing books with word2vec, summing LDA results to the word vectors, and adding in a few categorical variables like year or country when/where the book was written. Then use that "super-vector" to assign LDA topic (distributions) to the respective authors of the books. The final claim (which he makes a point of stating that his evidence is weak) is that you could use that "author-specific topic vector" to predict, e.g., how popular each author is.

captaindiego · on Jan 18, 2016

I've been trying to get the code to work myself but I keep getting import errors and a few issues with dependencies.

warrenmar · on Jan 18, 2016

Previous work by Chris on word2vec

http://multithreaded.stitchfix.com/blog/2015/03/11/word-is-w...

stared · on Jan 18, 2016

It is a wonderful post! And HN discussions:

https://news.ycombinator.com/item?id=9185091

https://news.ycombinator.com/item?id=10123041

ginger_beer_m · on Jan 18, 2016

I know word2vec and LDA separately, but what does this work do? Somehow combine the word similarities from word2vec when forming LDA topics?

juxtaposicion · on Jan 18, 2016

It combines the (arguably) best properties of both algorithms. Word2vec is local and creates word representations that are powerful and flexible. LDA is global, creating document representations that are less flexible but very interpretable to humans. lda2vec mixes both ideas.

Ultimately, the goal is to use all of the information that is usually available alongside text. Word2vec treats this text like one long string. LDA has the notion of documents. But lda2vec can use more features (for example) the zip code a client comment might come from (and so you get regional topics, like outer wear in Vermont or cowboy boots in Texas) the client ID a comment comes from (so you get that a client might be a sporty client, or a expecting mother) in addition to document-level topics (which might surface customer comments like "perfect service!" or package delivery problems). Those topics are readily consumed by analysts and can be used to understand the business from the client's perspective; word2vec on the other hand produces representations that are hard for anything but machines to consume.

meeper16 · on Jan 18, 2016

Word2Vec is based on an original approach from Lawrence Berkeley National Lab. This was also at the same time that David Blei was working on LDA at Berkeley. https://www.kaggle.com/c/word2vec-nlp-tutorial/forums/t/1234...

nl · on Jan 18, 2016

It seems like you have made this claim in a few places (Kaggle, and here under at least two different usernames).

Few people seem to agree with you, and whilst there certainly are similarities it looks to me like there are more differences.

I understand that you think your patent is being ignored, but I don't think commenting everywhere that mentioned Word2Vec is going to help you.

xuewei4d · on Jan 19, 2016

what does v_client mean in the page 108 of the slides?

aerioux · on Jan 18, 2016

awesome work :)