Hacker News new | past | comments | ask | show | jobs | submit login

Here's my attempt at an in-a-nutshell summary for those familiar with the underlying material. Warning: This might be complete nonsense!! Chris Moody proposes to replace the technique of summing paragraph vectors (to word vectors) with sparse "LDA-vectors"; Then he appends categorical variables (features) to these summed word+LDA vectors and estimates a multinomial mixture over the latent word topics. All this is applied to a conditional probability model to predict the final topic assignments ("topic vectors") for some set of pre-defined groupings of input documents. Finally, he claims the resulting, posterior topic assignments are even good enough to predict (supervised) outcomes. A more non-mathematical explanation: Imagine analyzing books with word2vec, summing LDA results to the word vectors, and adding in a few categorical variables like year or country when/where the book was written. Then use that "super-vector" to assign LDA topic (distributions) to the respective authors of the books. The final claim (which he makes a point of stating that his evidence is weak) is that you could use that "author-specific topic vector" to predict, e.g., how popular each author is.



Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: