This looks really interesting, but it pretty hard to follow without the video.
My summary after a quick flick through is that it is a better classification(/clustering?) model for text, because it takes Word2Vec-style similarity into account, which plain LDA doesn't. That sounds like a reasonable approach to me, and nice to see someone get it working.
Here's my attempt at an in-a-nutshell summary for those familiar with the underlying material. Warning: This might be complete nonsense!! Chris Moody proposes to replace the technique of summing paragraph vectors (to word vectors) with sparse "LDA-vectors"; Then he appends categorical variables (features) to these summed word+LDA vectors and estimates a multinomial mixture over the latent word topics. All this is applied to a conditional probability model to predict the final topic assignments ("topic vectors") for some set of pre-defined groupings of input documents. Finally, he claims the resulting, posterior topic assignments are even good enough to predict (supervised) outcomes. A more non-mathematical explanation: Imagine analyzing books with word2vec, summing LDA results to the word vectors, and adding in a few categorical variables like year or country when/where the book was written. Then use that "super-vector" to assign LDA topic (distributions) to the respective authors of the books. The final claim (which he makes a point of stating that his evidence is weak) is that you could use that "author-specific topic vector" to predict, e.g., how popular each author is.
It combines the (arguably) best properties of both algorithms. Word2vec is local and creates word representations that are powerful and flexible. LDA is global, creating document representations that are less flexible but very interpretable to humans. lda2vec mixes both ideas.
Ultimately, the goal is to use all of the information that is usually available alongside text. Word2vec treats this text like one long string. LDA has the notion of documents. But lda2vec can use more features (for example) the zip code a client comment might come from (and so you get regional topics, like outer wear in Vermont or cowboy boots in Texas) the client ID a comment comes from (so you get that a client might be a sporty client, or a expecting mother) in addition to document-level topics (which might surface customer comments like "perfect service!" or package delivery problems). Those topics are readily consumed by analysts and can be used to understand the business from the client's perspective; word2vec on the other hand produces representations that are hard for anything but machines to consume.
My summary after a quick flick through is that it is a better classification(/clustering?) model for text, because it takes Word2Vec-style similarity into account, which plain LDA doesn't. That sounds like a reasonable approach to me, and nice to see someone get it working.
I think. Comments?
Here is the version with notes. I haven't read this through yet: http://www.slideshare.net/ChristopherMoody3/word2vec-lda-and...
Code here, BTW: https://github.com/cemoody/lda2vec