Hacker News new | past | comments | ask | show | jobs | submit login
Top organizations in CS research (microsoft.com)
55 points by ssn on July 28, 2010 | hide | past | favorite | 38 comments



> # For example, Paper A is written by Author B and Author C, when this paper was published in year 2000, Author B is affiliated with Organization 1, and Author B was affiliated with Organization 2, provided such information was presented in the paper full text or meta data. Now Author A is affiliated with Organization 3, and Author B is with Organization 4. Our algorithm constructs the following relationship:

> # Paper A is related to Organization 1, 2, 3, and 4.

> # Therefore, all citations to Paper A will be contributed to all 4 organizations.

This seems rather ridiculous to me. So if I have 100 publications with at least 5 citations each at some unknown school, then I go to Stanford, all of a sudden Stanford gets 500 extra citations?

Also, I saw no mention about citation filtering. It's not uncommon for small communities to (intentionally or unintentionally) game these kinds of systems by publishing lots of low-quality papers and citing each other a lot. In fact, I didn't even see any mention about filtering out self-citations, which is absolutely necessary.


it's easy to remove self-citation, but it's hard (and unfair) to detect and remove citations in the case you mentioned "a small community ... publishing lots of low-quality papers and citing each other a lot. "


It seems like among the top 4, Berkeley has the highest citation/publication ratio, which indicates they publish less (due to size or just pub frequency), but with more meaningful stuff on average. I wonder if this is related to the laid-back culture of Berkeley that students don't have pressure to "publish".

Disclaimer: I am becoming a graduate student at Berkeley soon.


This would be far more indicative if it were scaled to organizational size. Some of the top places are great places, but they're also huge places.


Or at least if they included a column with # citations / publication.


Exactly what I thought. Here are the first 30 complete with terrible formatting:

UPR8011 Centre d'Elaboration des Matériaux et d'Etudes Structurales 187.36

J. Craig Venter Institute 70.09

Sri Lanka Press Institute 56.33

European Molecular Biology Laboratory 43.45

Apple Computer, Inc. 41.84

Wellcome Trust Sanger Institute 36.47

Swiss Institute of Bioinformatics 32.86

Google Inc. 28.5

European Bioinformatics Institute EMBL 28.2

Digital Equipment Corp. (DEC) 28.12

Palo Alto Research Center 27.21

Howard Hughes Medical Institute 26.18

Sun Microsystems Laboratories 24.43

Harvard University Harvard Business School 24.35

AT&T Labs Research 23.04

Santa Fe Institute 21.65

National Institute of Genetics Mishima 21.24

Salk Institute for Biological Studies 20.67

Bell Labs (Lucent Technologies Inc.) 20.53

University of California Berkeley 20.53

Weizmann Institute of Science 20.52

Yahoo Research Labs 20.28

University of London 19.69

Stanford University 19.63

Princeton University 19.51

Massachusetts Institute of Technology 19.14

Cisco Systems, Inc 19.06

Argonne National Laboratory 18.81

Microsoft 18.69

National Institutes of Health 18.34


Interesting. If one looks at your list, which clearly is less "biased" towards academia, it seems like it might indicate more accurately those organizations that produce research that is actually "useful"/influential, no?


More likely it looks like bias favoring cross-disciplinary research. Different fields have different typical citations rates. Biological sciences tend to have orders of magnitude higher impact factors than other fields [1].

So, for example, if you CS paper touches some bio topic, it will gather a lot more citations, also it will have much higher chance to appear in high impact factor general science publications like Nature/Science.

For the companies, there can be bias from higher availability of freely accessible full-text articles. I remember there was some analysis in Nature of how online availability increases citations [2]. Companies are more likely to put their papers online and also have better "SEO", so more people are going to see and cite their papers.

[1] http://en.wikipedia.org/wiki/Impact_factor

[2] http://www.nature.com/nature/journal/v411/n6837/full/411521a...


It varies by company and era, but companies also pull a lot of citations that aren't really to the research content of the paper so much as to the mere fact that the paper exists, documenting that X was used by Company Y. (Academics also get a bunch of those kinds of throwaway citations, but I think it's considerably higher for citations to industry papers.) It's often useful if you're working in a theoretical area especially to be able to throw in a few cites of the "hey people really use this!" variety--- "similar ideas are used in deployed systems by Microsoft [1] and Google [2]" type things.

For example, if you were to look through the absolutely colossal number of citations that Google's MapReduce paper has gotten, the vast majority are supporting statements like "parallelism is important" or "these kinds of techniques are used in practice" or "a common current approach is". Not that it doesn't have a bunch of citations actually about the paper as well, but I'd suspect that it has more of these "free" kinds of citations than most papers do.


I must be looking at the wrong list, where was Apple Computer, Inc.?


I'll post this on my blog soon.

Here is a list sorted by the ratio, with a cutoff of at least 1000 publications. 4 universities lead by a wide margin: 1. Berkeley 2. Stanford 3. Princeton 4. MIT

   1          Wellcome Trust Sanger Institute       1239      45181   36.4
   2                              Google Inc.       2126      60583   28.4
   3                Palo Alto Research Center       1560      42443   27.2
   4                       AT&T Labs Research       5699     131287   23.0
   5     Bell Labs (Lucent Technologies Inc.)       4656      95602   20.5
   6        University of California Berkeley      25809     529791   20.5
   7            Weizmann Institute of Science       4116      84446   20.5
   8                      Yahoo Research Labs       2235      45322   20.2
   9                      Stanford University      30528     599120   19.6
  10                     Princeton University       8393     163755   19.5
  11    Massachusetts Institute of Technology      30211     578190   19.1
  12              Argonne National Laboratory       2723      51228   18.8
  13                                Microsoft      20303     379561   18.6
  14            National Institutes of Health       8817     161731   18.3
  15                        SRI International       3193      58445   18.3
  16                         BBN Technologies       1633      27262   16.6
  17                         Brown University       5075      83557   16.4
  18                       Harvard University      16931     277193   16.3
  19                       Cornell University      11989     195178   16.2
  20                     University of Oregon       2057      32635   15.8
  21                          Rice University       5238      82713   15.7
  22                 University of Washington      14667     228336   15.5
  23                        Intel Corporation       4036      62412   15.4
  24               Carnegie Mellon University      31227     481334   15.4
  25                          Yale University       7140     109336   15.3
  26                                      IBM      23142     352198   15.2
  27       California Institute of Technology       7969     121192   15.2
  28    Lawrence Berkeley National Laboratory       3646      55239   15.1
  29                     Hewlett Packard Labs       5180      76499   14.7
  30      University of California Santa Cruz       4309      62968   14.6
  31                      New York University       7844     114416   14.5
  32           Hebrew University of Jerusalem       4777      69261   14.4
  33 Mitsubishi Electric Research Laboratorie       1911      27387   14.3
  34             Columbia University New York      11380     157882   13.8
  35                 University of California       2275      31547   13.8
  36     University of California Los Angeles      16214     223380   13.7
  37          University of Wisconsin Madison      11919     164035   13.7
  38       Oregon Health & Science University       2035      27992   13.7
  39                    University of Chicago       5299      72038   13.5
  40                      Brandeis University       1524      20597   13.5
  41                                      NEC       1219      16446   13.4
  42               University of Pennsylvania      11177     150223   13.4
  43        Washington University Saint Louis       6646      88489   13.3
  44      University of Massachusetts-Amherst       6436      85339   13.2
  45       University of California San Diego      15449     204661   13.2
  46                  University of Rochester       3876      50691   13.0
  47                  University of Cambridge      13978     181921   13.0
  48             US Naval Research Laboratory       1717      21984   12.8
  49                    University of Toronto      12396     158501   12.7
  50                University College London      10831     136958   12.6
  51                     University of Oxford       9704     122210   12.5
  52   University of California San Francisco       3253      40915   12.5
  53        University of Southern California      17375     218213   12.5
  54      University of Massachusetts Amherst       1381      17166   12.4
  55           University of Colorado Boulder       6574      81660   12.4
  56                 Johns Hopkins University       6917      84569   12.2
  57                   University of Michigan      16286     197470   12.1
  58 University of North Carolina Chapel Hill       6867      82186   11.9
  59                          Duke University       8430     100875   11.9
  60                        Boston University       6744      80477   11.9
  61                       Rutgers University      10092     119986   11.8
  62                    Georgetown University       1182      13938   11.7
  63   University of California Santa Barbara       8238      96305   11.6
  64           École Normale Supérieure Paris       2102      24111   11.4
  65                   University of Virginia       5040      56874   11.2
  66                   University of Maryland      16658     187911   11.2
  67  University of Illinois Urbana Champaign      22084     248645   11.2
  68               University of Texas Austin      14418     160564   11.1
  69                      Tel Aviv University       7936      88271   11.1
  70                        Dartmouth College       2958      32683   11.0
  71                Portland State University       1795      19624   10.9
  72                  University of Minnesota      12559     137294   10.9
  73          University of California Irvine      10365     111280   10.7
  74   Lawrence Livermore National Laboratory       1682      18018   10.7
  75 National Institute of Standards and Tech       3612      38352   10.6
  76             École Polytechnique (France)       1370      14360   10.4
  77                  Northwestern University       7908      82239   10.3
  78  Technion Israel Institute of Technology       7391      76190   10.3
  79           University of British Columbia       9613      98773   10.2
  80                    Nokia Research Center       1683      17288   10.2
  81                    University of Arizona       6378      65277   10.2
  82                  Oregon State University       2499      25469   10.1
  83                    University of Waikato       1572      15804   10.0
  84                 University of Copenhagen       2232      22291   9.98
  85          Case Western Reserve University       2648      26219   9.90
  86       Polytechnic University of New York       1771      17506   9.88
  87                       University of Utah       5313      52423   9.86
  88                     University of Dundee       1088      10727   9.85
  89 Institut National De Recherche en Inform       1081      10581   9.78
  90           University of California Davis       8423      81240   9.64
  91        Commissariat a l'Ënergie Atomique       2403      23170   9.64
  92                   Stony Brook University       4358      42017   9.64
  93          Technical University of Denmark       2859      27564   9.64
  94                         Imperial College       4775      45981   9.62
  95           Los Alamos National Laboratory       4206      40442   9.61
  96          Georgia Institute of Technology      17315     165915   9.58
  97            Agricultural Research Service       1419      13596   9.58
  98                  University of Edinburgh       9598      91615   9.54
  99                     Texas Medical Center       1103      10514   9.53
 100                         Institut Pasteur       1315      12468   9.48


They lead by a large margin because they are buying their scores. People who have highly-cited papers are going to end up at very prestigious schools. For instance, imagine you publish a paper while at Oregon State on curing cancer and it gets 5,000 citations. Soon, you're offered a tenure-track position at Stanford. According to this scoring system, your cancer paper is now "associated" with Stanford as well as OSU, thus giving Stanford a big boost.


Yes, but when you are at Stanford you are then contributing your knowledge and skills to the Stanford research community. It doesn't matter whether you were at Stanford when you wrote the paper about curing cancer because a researcher at any institution will be teaching, writing, and contributing to them being a top research institution.


Sure, but if you're still performing top-notch research, your future publications at Stanford will show that. The goal is to rank schools based on which ones are the best research universities. The pressure to perform future research is lower for schools that can buy citations in this system.


  44      University of Massachusetts-Amherst
  54      University of Massachusetts Amherst
Uh?


Equally interesting, check out the first of those UMass Amherst links to see an example of how little scrubbing Microsoft did in the case of professor names, as well. The same professor is listed as #4 and #5, whereas combined, he would be #1 at UMass Amherst, and in the top 5 or 6 most-cited professors in the history of MIT or Stanford, with ~15,000 citations.

http://academic.research.microsoft.com/Organization/3912.asp...


That dupe is in the original data as well.


How do you do the formatting like that?



Somewhere (linked from HN?), I saw an article describing a publication scoring system where your score is the largest n such that you have at least n papers each of which has been cited at least n times. I'd be curious how these organizations rank under that system as well.


It's called the h-index: http://en.wikipedia.org/wiki/H-index


Microsoft Academic Search already supports H-index and G-index for authors.

for example: http://academic.research.microsoft.com/Author/508835.aspx


It's interesting to note how much Google's (citation) influence has decreased over time. 60583 -> 19354 -> 3893 for all years -> last 10 -> last 5, where "all years" means about 12 (!). I'm guessing it's the exponential falloff of PageRank paper citations over time.


It is important to keep in mind that ... the older a paper is, the more "chances" it gets to be cited. On average, papers published 20 yrs ago would've been cited more than papers published 10 yrs ago. It is more fair if you compare publication citations in 10 years.

I.e. compare

citation between 1990 and 2000 for paper A published in 1990

vs

citation between 2000 and 2010 for paper published in 2000


I think we're talking about independent observations. I found it remarkable that Google's papers received 60583-19354=41229 citations between (presumably) 1998 and 2000 but only 19354 between 2000 and 2010. That's a pretty staggering difference, especially as the number is an aggregate, as you say. Based on this observation I theorise that the PageRank papers were hugely influential, but the recent published work has paled in comparison. (i.e. those 19354 will also include later citations of the early papers, inflating the 41229 figure further)


I definitely agree with you pagerank would be one of the most influential paper in Google history. However, for the year filter, I might be wrong but I think it filters based on the year the paper is published, not the year it gets cited. So when you pick "last 5 years", it won't include the citations for pagerank.


I don't know what conclusions you can really draw from this, but the underlying service (Microsoft Academic Search) is pretty interesting. I've been checking out publications from my school for the last hour.


this seems to be simply a ranking of organisations by the number of citations.

there are a number of factors missed:

1. what's the earliest publication of each organisation?

2. what's the real world impact of the publications? (citations are not real world impact.)

3. what was the significance of an organisation's contribution to a publication? for example, there might be many authors, where only one author did the actual work.

4. it should be normalized by the size of the organisation.

5. CS is multidisciplinary: plenty of people in CS publish in other domains such as biology, mathematics, statistics.

i couldn't work out if they counted self-citations or not.


Re impact: While it's not strictly correlated, the number of citations does indicate to a certain degree the real world impact of a publication. This is usually how you measure "impact" in the academia.


It's the wrong way to measure real world impact. It just happens to be an incredibly easy way for tenure and hiring committees to make decisions. There are a ton of problems with it, many have been pointed out in this thread. One I haven't seen mentioned is that lit reviews cited disproportionately (especially in the sciences). Many of them aren't even read. Many journals limit the number of citations so authors cite one lit review rather than many separate studies.


what's the earliest publication of each organisation?

You can control for that by selecting last 10 or 5 years. That makes Microsoft look even better, though.

it should be normalized by the size of the organisation.

I guess it depends, I think it is ok like this. This shows which organization has biggest impact on computer science.


I don't quite get why the citations for major institutions have grown at a rate of ~10K per year since the late 90s, but much fewer for the decade before that.

Can anyone explain?


Most people weren't publishing as computer scientists but rather electrical engineers and/or mathematicians.

At least that's my assumption. CS is a pretty young field after all.


> 2. what's the real world impact of the publications? (citations are not real world impact.)

so any suggestions on real world impact? winning a Turing Award?


Not sure what data sources they are using here but there is probably certain organisations which a lot of their stuff is missed and some where everything is present. If it's similar to Google scholar a lot of the stuff my uni does doesn't end up on there.

Fully agree that papers published has no correlation with an organisations actual overall contribution to the field. Even citations really, because in some cases something broad will get cited a lot as it relates to a lot of papers without actually presenting something new or important.


I wonder how much namespace pollution there is? I found at least one incorrect attribution due to identical names.


to report and correct identical names, you can cick the <edit> button on top of each author's profile page, then you can make changes to his/her hompage, affilication, papers, etc. you can even contribute papers.


I wonder how many decades IBM has been in the top 10?




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: