> # For example, Paper A is written by Author B and Author C, when this paper was published in year 2000, Author B is affiliated with Organization 1, and Author B was affiliated with Organization 2, provided such information was presented in the paper full text or meta data. Now Author A is affiliated with Organization 3, and Author B is with Organization 4. Our algorithm constructs the following relationship:
> # Paper A is related to Organization 1, 2, 3, and 4.
> # Therefore, all citations to Paper A will be contributed to all 4 organizations.
This seems rather ridiculous to me. So if I have 100 publications with at least 5 citations each at some unknown school, then I go to Stanford, all of a sudden Stanford gets 500 extra citations?
Also, I saw no mention about citation filtering. It's not uncommon for small communities to (intentionally or unintentionally) game these kinds of systems by publishing lots of low-quality papers and citing each other a lot. In fact, I didn't even see any mention about filtering out self-citations, which is absolutely necessary.
it's easy to remove self-citation, but it's hard (and unfair) to detect and remove citations in the case you mentioned "a small community ... publishing lots of low-quality papers and citing each other a lot. "
It seems like among the top 4, Berkeley has the highest citation/publication ratio, which indicates they publish less (due to size or just pub frequency), but with more meaningful stuff on average. I wonder if this is related to the laid-back culture of Berkeley that students don't have pressure to "publish".
Disclaimer: I am becoming a graduate student at Berkeley soon.
Interesting. If one looks at your list, which clearly is less "biased" towards academia, it seems like it might indicate more accurately those organizations that produce research that is actually "useful"/influential, no?
More likely it looks like bias favoring cross-disciplinary research. Different fields have different typical citations rates. Biological sciences tend to have orders of magnitude higher impact factors than other fields [1].
So, for example, if you CS paper touches some bio topic, it will gather a lot more citations, also it will have much higher chance to appear in high impact factor general science publications like Nature/Science.
For the companies, there can be bias from higher availability of freely accessible full-text articles. I remember there was some analysis in Nature of how online availability increases citations [2]. Companies are more likely to put their papers online and also have better "SEO", so more people are going to see and cite their papers.
It varies by company and era, but companies also pull a lot of citations that aren't really to the research content of the paper so much as to the mere fact that the paper exists, documenting that X was used by Company Y. (Academics also get a bunch of those kinds of throwaway citations, but I think it's considerably higher for citations to industry papers.) It's often useful if you're working in a theoretical area especially to be able to throw in a few cites of the "hey people really use this!" variety--- "similar ideas are used in deployed systems by Microsoft [1] and Google [2]" type things.
For example, if you were to look through the absolutely colossal number of citations that Google's MapReduce paper has gotten, the vast majority are supporting statements like "parallelism is important" or "these kinds of techniques are used in practice" or "a common current approach is". Not that it doesn't have a bunch of citations actually about the paper as well, but I'd suspect that it has more of these "free" kinds of citations than most papers do.
Here is a list sorted by the ratio, with a cutoff of at least 1000 publications. 4 universities lead by a wide margin: 1. Berkeley 2. Stanford 3. Princeton 4. MIT
1 Wellcome Trust Sanger Institute 1239 45181 36.4
2 Google Inc. 2126 60583 28.4
3 Palo Alto Research Center 1560 42443 27.2
4 AT&T Labs Research 5699 131287 23.0
5 Bell Labs (Lucent Technologies Inc.) 4656 95602 20.5
6 University of California Berkeley 25809 529791 20.5
7 Weizmann Institute of Science 4116 84446 20.5
8 Yahoo Research Labs 2235 45322 20.2
9 Stanford University 30528 599120 19.6
10 Princeton University 8393 163755 19.5
11 Massachusetts Institute of Technology 30211 578190 19.1
12 Argonne National Laboratory 2723 51228 18.8
13 Microsoft 20303 379561 18.6
14 National Institutes of Health 8817 161731 18.3
15 SRI International 3193 58445 18.3
16 BBN Technologies 1633 27262 16.6
17 Brown University 5075 83557 16.4
18 Harvard University 16931 277193 16.3
19 Cornell University 11989 195178 16.2
20 University of Oregon 2057 32635 15.8
21 Rice University 5238 82713 15.7
22 University of Washington 14667 228336 15.5
23 Intel Corporation 4036 62412 15.4
24 Carnegie Mellon University 31227 481334 15.4
25 Yale University 7140 109336 15.3
26 IBM 23142 352198 15.2
27 California Institute of Technology 7969 121192 15.2
28 Lawrence Berkeley National Laboratory 3646 55239 15.1
29 Hewlett Packard Labs 5180 76499 14.7
30 University of California Santa Cruz 4309 62968 14.6
31 New York University 7844 114416 14.5
32 Hebrew University of Jerusalem 4777 69261 14.4
33 Mitsubishi Electric Research Laboratorie 1911 27387 14.3
34 Columbia University New York 11380 157882 13.8
35 University of California 2275 31547 13.8
36 University of California Los Angeles 16214 223380 13.7
37 University of Wisconsin Madison 11919 164035 13.7
38 Oregon Health & Science University 2035 27992 13.7
39 University of Chicago 5299 72038 13.5
40 Brandeis University 1524 20597 13.5
41 NEC 1219 16446 13.4
42 University of Pennsylvania 11177 150223 13.4
43 Washington University Saint Louis 6646 88489 13.3
44 University of Massachusetts-Amherst 6436 85339 13.2
45 University of California San Diego 15449 204661 13.2
46 University of Rochester 3876 50691 13.0
47 University of Cambridge 13978 181921 13.0
48 US Naval Research Laboratory 1717 21984 12.8
49 University of Toronto 12396 158501 12.7
50 University College London 10831 136958 12.6
51 University of Oxford 9704 122210 12.5
52 University of California San Francisco 3253 40915 12.5
53 University of Southern California 17375 218213 12.5
54 University of Massachusetts Amherst 1381 17166 12.4
55 University of Colorado Boulder 6574 81660 12.4
56 Johns Hopkins University 6917 84569 12.2
57 University of Michigan 16286 197470 12.1
58 University of North Carolina Chapel Hill 6867 82186 11.9
59 Duke University 8430 100875 11.9
60 Boston University 6744 80477 11.9
61 Rutgers University 10092 119986 11.8
62 Georgetown University 1182 13938 11.7
63 University of California Santa Barbara 8238 96305 11.6
64 École Normale Supérieure Paris 2102 24111 11.4
65 University of Virginia 5040 56874 11.2
66 University of Maryland 16658 187911 11.2
67 University of Illinois Urbana Champaign 22084 248645 11.2
68 University of Texas Austin 14418 160564 11.1
69 Tel Aviv University 7936 88271 11.1
70 Dartmouth College 2958 32683 11.0
71 Portland State University 1795 19624 10.9
72 University of Minnesota 12559 137294 10.9
73 University of California Irvine 10365 111280 10.7
74 Lawrence Livermore National Laboratory 1682 18018 10.7
75 National Institute of Standards and Tech 3612 38352 10.6
76 École Polytechnique (France) 1370 14360 10.4
77 Northwestern University 7908 82239 10.3
78 Technion Israel Institute of Technology 7391 76190 10.3
79 University of British Columbia 9613 98773 10.2
80 Nokia Research Center 1683 17288 10.2
81 University of Arizona 6378 65277 10.2
82 Oregon State University 2499 25469 10.1
83 University of Waikato 1572 15804 10.0
84 University of Copenhagen 2232 22291 9.98
85 Case Western Reserve University 2648 26219 9.90
86 Polytechnic University of New York 1771 17506 9.88
87 University of Utah 5313 52423 9.86
88 University of Dundee 1088 10727 9.85
89 Institut National De Recherche en Inform 1081 10581 9.78
90 University of California Davis 8423 81240 9.64
91 Commissariat a l'Ënergie Atomique 2403 23170 9.64
92 Stony Brook University 4358 42017 9.64
93 Technical University of Denmark 2859 27564 9.64
94 Imperial College 4775 45981 9.62
95 Los Alamos National Laboratory 4206 40442 9.61
96 Georgia Institute of Technology 17315 165915 9.58
97 Agricultural Research Service 1419 13596 9.58
98 University of Edinburgh 9598 91615 9.54
99 Texas Medical Center 1103 10514 9.53
100 Institut Pasteur 1315 12468 9.48
They lead by a large margin because they are buying their scores. People who have highly-cited papers are going to end up at very prestigious schools. For instance, imagine you publish a paper while at Oregon State on curing cancer and it gets 5,000 citations. Soon, you're offered a tenure-track position at Stanford. According to this scoring system, your cancer paper is now "associated" with Stanford as well as OSU, thus giving Stanford a big boost.
Yes, but when you are at Stanford you are then contributing your knowledge and skills to the Stanford research community. It doesn't matter whether you were at Stanford when you wrote the paper about curing cancer because a researcher at any institution will be teaching, writing, and contributing to them being a top research institution.
Sure, but if you're still performing top-notch research, your future publications at Stanford will show that. The goal is to rank schools based on which ones are the best research universities. The pressure to perform future research is lower for schools that can buy citations in this system.
Equally interesting, check out the first of those UMass Amherst links to see an example of how little scrubbing Microsoft did in the case of professor names, as well. The same professor is listed as #4 and #5, whereas combined, he would be #1 at UMass Amherst, and in the top 5 or 6 most-cited professors in the history of MIT or Stanford, with ~15,000 citations.
Somewhere (linked from HN?), I saw an article describing a publication scoring system where your score is the largest n such that you have at least n papers each of which has been cited at least n times. I'd be curious how these organizations rank under that system as well.
It's interesting to note how much Google's (citation) influence has decreased over time. 60583 -> 19354 -> 3893 for all years -> last 10 -> last 5, where "all years" means about 12 (!). I'm guessing it's the exponential falloff of PageRank paper citations over time.
It is important to keep in mind that ... the older a paper is, the more "chances" it gets to be cited. On average, papers published 20 yrs ago would've been cited more than papers published 10 yrs ago. It is more fair if you compare publication citations in 10 years.
I.e. compare
citation between 1990 and 2000 for paper A published in 1990
vs
citation between 2000 and 2010 for paper published in 2000
I think we're talking about independent observations. I found it remarkable that Google's papers received 60583-19354=41229 citations between (presumably) 1998 and 2000 but only 19354 between 2000 and 2010. That's a pretty staggering difference, especially as the number is an aggregate, as you say. Based on this observation I theorise that the PageRank papers were hugely influential, but the recent published work has paled in comparison. (i.e. those 19354 will also include later citations of the early papers, inflating the 41229 figure further)
I definitely agree with you pagerank would be one of the most influential paper in Google history. However, for the year filter, I might be wrong but I think it filters based on the year the paper is published, not the year it gets cited. So when you pick "last 5 years", it won't include the citations for pagerank.
I don't know what conclusions you can really draw from this, but the underlying service (Microsoft Academic Search) is pretty interesting. I've been checking out publications from my school for the last hour.
this seems to be simply a ranking of organisations by the number of citations.
there are a number of factors missed:
1. what's the earliest publication of each organisation?
2. what's the real world impact of the publications? (citations are not real world impact.)
3. what was the significance of an organisation's contribution to a publication? for example, there might be many authors, where only one author did the actual work.
4. it should be normalized by the size of the organisation.
5. CS is multidisciplinary: plenty of people in CS publish in other domains such as biology, mathematics, statistics.
i couldn't work out if they counted self-citations or not.
Re impact: While it's not strictly correlated, the number of citations does indicate to a certain degree the real world impact of a publication. This is usually how you measure "impact" in the academia.
It's the wrong way to measure real world impact. It just happens to be an incredibly easy way for tenure and hiring committees to make decisions. There are a ton of problems with it, many have been pointed out in this thread. One I haven't seen mentioned is that lit reviews cited disproportionately (especially in the sciences). Many of them aren't even read. Many journals limit the number of citations so authors cite one lit review rather than many separate studies.
I don't quite get why the citations for major institutions have grown at a rate of ~10K per year since the late 90s, but much fewer for the decade before that.
Not sure what data sources they are using here but there is probably certain organisations which a lot of their stuff is missed and some where everything is present. If it's similar to Google scholar a lot of the stuff my uni does doesn't end up on there.
Fully agree that papers published has no correlation with an organisations actual overall contribution to the field. Even citations really, because in some cases something broad will get cited a lot as it relates to a lot of papers without actually presenting something new or important.
to report and correct identical names, you can cick the <edit> button on top of each author's profile page, then you can make changes to his/her hompage, affilication, papers, etc. you can even contribute papers.
> # Paper A is related to Organization 1, 2, 3, and 4.
> # Therefore, all citations to Paper A will be contributed to all 4 organizations.
This seems rather ridiculous to me. So if I have 100 publications with at least 5 citations each at some unknown school, then I go to Stanford, all of a sudden Stanford gets 500 extra citations?
Also, I saw no mention about citation filtering. It's not uncommon for small communities to (intentionally or unintentionally) game these kinds of systems by publishing lots of low-quality papers and citing each other a lot. In fact, I didn't even see any mention about filtering out self-citations, which is absolutely necessary.