Hacker News new | past | comments | ask | show | jobs | submit login
Stuff Harvard people Like (echen.me)
87 points by ekm2 on Oct 31, 2011 | hide | past | favorite | 10 comments



I'm blown away that UC Riverside is the second most predictive interest for Berkeley. I wonder if, somehow, UCR students are also showing up in the dataset, also bringing down the predictive power in the already "too large and diverse for an overall characterization" data.


I'm curious whether the info about Harvard students is accurate as well. If there's one university I would imagine gets followed by a lot of non-students it's Harvard.

For a clear illustration, look at their respective facebook place pages: Harvard: 918,295 likes; Stanford: 248,574; UC Berkeley: 86,214; MIT: 65,339; Caltech: 2,686;

This clearly isn't about the schools' relative sizes or their popularity among past students but the global reach of their respective brand names.

Some of the things listed for Harvard rang very true from my time there (particularly the interests in consulting, new york, private equity, and famous harvard grads like Conan) while others made less sense to me (Jimmy Fallon?)

Still a very interesting data set though, if only to see what kinds of people are influenced by each school's brand.


Oooooh man, Latent Dirichlet Allocation is cool stuff, especially in the context of topic modeling (which is what Chen is doing here). OP actually wrote a pretty accessible blog post about how he does this sort of thing which you can see at [1].

If you don't want to read it, Mike Jordan has a pretty neat presentation about it at [2]. If you're statistically trained and don't want to view the video, you will probably understand this synopsis: is that you can view each document in a set of documents as a discrete admixture of some number of topics. If you imagine that words can be modeled with a discrete exchangeable random variable, and you choose some number of topics to model (let's say k topics), then you can use a hierarchical Bayesian model, specifically with an underlying Dirichlet distribution over the base measure of each Dirichlet process that forms every particular admixture document. This allows topics to share some amount of information, which then allows you to generate some pretty useful topics, like the ones from the OP.

If you don't understand that synopsis, then go look at Jordan's talk. It makes it all pretty clear. :)

[1] http://blog.echen.me/2011/08/22/introduction-to-latent-diric...

[2] http://videolectures.net/icml05_jordan_dpcrp/


Another interesting thing to think about is what sort of student at each university is the most likely to use Quora.

For example, I noticed that Berkeley seemed to have a bit of bias towards web technologies (Ruby or Rails and the like). While there are certainly plenty of people around that like web apps (and it seems almost everyone has a pet project on the web these days), I suspect that they are overrepresented simply because people making web apps are much more likely to be on Quora than hippies (Berkeley has those too, believe it or not).


Follwers of a school doesn't necessarily mean that those people actually attend that school; particularly for prestigious schools, there are plenty of people who aspire to or admire the school, but actually aren't part of it. Its a worthy difference.


It sounds like there was some filtering to try to prevent that, for example, filtering out people who follow both Harvard and Stanford. Interesting anyways, even just to see what people interested in attending certain schools are interested in.


> Berkeley, sadly, is perhaps too large and diverse for an overall characterization.

I'm hoping that the author means sadly in the I wish I could properly characterize the school but can't rather than sadly the school is unfocused.


Berkeley's definitely the odd man out. Here are some rough approximations on the number of undergraduates at each institution...

Stanford: ~6,800 Harvard: ~6,600 MIT: ~4,200 Berkeley: ~25,000

Keep in mind also that UCB has about twice as high a percentage of low income student (so the class size at UCB is 4-5 times the size of these elite undergraduate schools, with 8-10 times the number of low income students, many of whom are transfer students). Also, UCB is probably more "local" among undergrad populations, since about 70% come from California. So I'd expect a very different profile for undergrads.

Interestingly, the profile of the graduate student population at all four schools is probably fairly similar, as the elite private colleges admit far more grad students than undergrads, whereas the opposite is true for UCB - and PhD programs at UCs aren't subject to the same geographic restrictions as the undergrad program.


Caltech is the odd one out in the other direction:

Caltech undergrad enrollment: 967


Well, at least SOMEONE likes Klout.

(I do like getting free stuff though)




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: