I took a stab at trying to interpret the topics output by this run of LDA. Green is one the clearest: generally convolutional deep nets, image classification, empirical work.
Brown seems to have picked up on linear algebra. "Vector", "matrix", "tensor" and "decomposition" all get consistently labeled brown, as do "eigenvalues", "orthogonal" and "sparse".
The rest are not as useful. Black almost always has "number", "set", "tree" and "random", but little else. Purple at times seems to signify topic modeling, but also contains "neural" and "feedforward". Blue seems to be the stats topic, containing "Bayes", "regression", "gaussian", and markov processes. But it also contains random words like "university" and "international".
Overall, very interesting. I wonder if these topics would be even better defined with a higher setting of k.
Karpathy had a different interpretation (in the green bar at the top of the page). For example, purple would be neuroscience.
In addition to adjusting k, another change that might be interesting would be to include also previous years' papers in the model estimation. Changes in component (topic) weights year-over-year could perhaps reveal something about the topics, or the papers.
Yup, it seems k was fixed since the first time these scripts were made for NIPS 2012 (?). Some of the more well-established advances since LDA would also likely help, like HDP.
It's only in the last 12 months that it became clear this was possible. The Ng "Zero Shot Learning" paper came out at NIPS2013, and given the lead time for a paper like that I think they must have started work at about that time.
There are many machine learning libraries that have good implementations of LDA (e.g. Gensim), so it should be "relatively" straightforward to create the topics and clustering based on the abstracts of the papers.
I think there might be confusion about wht nl was referring to. Yes, the link is to a list (produced by Karpathy) of papers on which LDA has been performed.
But one of the listed papers is also by Kapathy ("Deep Fragment Embeddings for Bidirectional Image
Sentence Mapping"), and I think this might be what nl is complimenting as being done quickly.
When the papers mention that code will be released, is that right now, or when the conference happens? I couldn't find any links to the code in any of the papers, including the karpathy one
Brown seems to have picked up on linear algebra. "Vector", "matrix", "tensor" and "decomposition" all get consistently labeled brown, as do "eigenvalues", "orthogonal" and "sparse".
The rest are not as useful. Black almost always has "number", "set", "tree" and "random", but little else. Purple at times seems to signify topic modeling, but also contains "neural" and "feedforward". Blue seems to be the stats topic, containing "Bayes", "regression", "gaussian", and markov processes. But it also contains random words like "university" and "international".
Overall, very interesting. I wonder if these topics would be even better defined with a higher setting of k.