The decision of our faculty to make their papers freely
accessible online will ensure that the global community of
researchers, students, and casual followers of science and
engineering will learn about our work at earlier stages, enabling
them to put it to use for the benefit of society.
I love how they mention "casual followers of science". To me this is a huge deal. I'm not a member of academia, but I really enjoy reading papers from a wide variety of fields (in the spirit of "learn everything about something and something about everything"). I never intend to stop, and I know lots of "laymen" who do the same.
The unavailability of papers, lack of centralized tools, and terrible search interfaces have been incredibly frustrating. I can't wait for a day when we get a centralized, non-profit, publish/subscribe consumer service for all the papers that ever get published by major research universities (with a good search tool). The value of such a service to the public and society would be enormous. As more and more people get educated and used to dealing with science, this could be as big a deal as wikipedia.
We're not quite there yet, but this is a huge step by CalTech in the right direction.
To me the engineering audience group is a big deal. Regular engineers that do no research per-se might be interested in following research related to their application field to ensure that their developments follow the state of the art. As such they would qualify as more than "casual followers of science" but ain't researchers either.
Two examples come to mind: first developers of distributed databases might want to follow the progress on distributed data-structures such as CRDTs [1]. Apparently Riak developers at Basho are fast at implementing such new theoretical results into their product [2].
I am pretty sure that Basho as an organization is not a subscriber to any pay-walled CS journals. So having such articles as Open Access helps make it discoverable by the engineering community, just by googling, sharing links over social networks, mailing lists or news sites such as HN and reddit.
Similarly developers of (big or small) data analytics tools might want to follow the research on machine learning algorithms so as to implement the state of the art and empirically evaluate it against the previous baselines. As an engineer contributing to the scikit-learn project this is basically my job. I am no researcher myself but I (and the many other scikit-learn developers) try to help transfer the results from academic research to make it available to our users community that reaches out of the traditional academic circles.
Having researchers publish their results in Open Access venues reduces the friction to transfer new results to practical, productionalized implementations (e.g. as open source projects maintained over the years). Open Access is one of the tools to help break the traditional Researcher / Engineering boundary as agile development tools and the devops movement helped break the Developer / Sysadmin boundary. To me Open Access is a fundamental building block of "Agile Science".
> I love how they mention "casual followers of science"
Me too. This is quite beautiful. After I left academia, subscribing to all the journals I used to have access to through my university became economically unfeasible, for obvious reasons.
My alma mater's library has a proxy that you can go through, as if you were sitting right there in the library, and have access to any journal they subscribe to. The trick being you have to be a current student, or alumni student. Alumni membership per year is the equivalent of $40, you can take out books from the library and use that proxy. You should see if your university doesn't have something similar.
I agree with you in spirit, but it's getting harder and harder to follow academic papers, even in areas where I have a degree. In areas like Physics (which fascinates me, though I've only had one year in undergrad) it is VERY hard to follow anything academic.
I still applaud what they're doing. If the research is even remotely funded by taxpayers, then everyone should have access to it. (I consider it partially funded if the school gets NSF grants, or is a non-profit)
> I can't wait for a day when we get a centralized, non-profit, publish/subscribe consumer service for all the papers that ever get published by major research universities (with a good search tool)
You probably know about this already, but if you're interested in anything biomedical in nature, the US National Library of Medicine's PubMed interface to the MEDLINE database (also managed by the NLM) is exactly what you describe: http://www.ncbi.nlm.nih.gov/pubmed
MEDLINE is a bibliographic database that indexes all of the abstracts going back to the 1960s from literally every major (and almost all minor) journals that do anything even remotely related to biomedicine, and also include a surprising number of physics and CS journals. It features a robust and easy-to-use API, and has since the 1990s. All MEDLINE entries include obvious bibliographic metadata (titles, authors, abstracts, dates, etc.) but also include human-curated index headings, which makes searching way easier. PubMed's interface makes it easy to construct complicated and accurate queries, and the NLM offers online access to their reference librarians, as well.
Furthermore, any published research that is funded by the US National Institutes of Health has to have its full text (including figures, references, etc.) posted to PubMed Central (PMC), which is another online database that the NLM maintains. This is an actual law- if you get money from the NIH, your publications have to end up in PMC following a short embargo period- I think it's about six months post-publication?
All of the PMC content is available free of charge, and it also has an API. It is not a complete full-text mirror of MEDLINE, because not everything in MEDLINE was NIH-funded, but for research going back over the last 7-10 years or so it is often surprisingly complete.
This is one area where biomedicine is way ahead of computer science. As somebody who's worked a lot in both fields, I definitely find myself missing PubMed whenever I need to look for CS literature...
As an aside, the National Library of Medicine is truly one of the hidden jewels of the US government. Its budget is tiny, but it somehow manages to produce fabulous work and performs a vital service for the research community. And almost nobody has heard of it!
> The unavailability of papers, lack of centralized tools, and terrible search interfaces have been incredibly frustrating.
I work at a growing OA Publisher (www.ubiquitypress.com ;) and can completely emphasize - it's one of a bag of problems we're looking at. The search interfaces on individual journals is the standard mixed affair however there are many indexes out there that aggregate articles or just their metadata with decent navigation. Some even specialize in OA: https://www.google.com/search?q=OA+index
I'm not too unhappy with Google Scholar lately. It seems to have nearly complete coverage of anything that can be found in an online paper archive (both institutional archives and journals' own archives), and a better search interface than those archives do. Via extracting references it even indexes a large amount of older material that's not yet digitized; of course it can't link you PDFs in that case, but it still provides the citation and can return offline papers by title/author/date in search results.
Google Scholar has a reasonable interface, but I find its actual content to be unreliable at best. Downloading references in e.g. BibTeX format, I almost always find significant errors, sometimes in relatively innocuous fields (different formats of a conference name, etc.) but other times in more important fields (authors' names).
It's a great tool, and I use it all the time, but it's certainly no replacement to a properly-managed bibliographic database.
Yeah, I use it purely for discovery, not for bibliographic management. I usually just type in BibTeX data myself rather than importing it from anywhere. Often PDFs have the needed info (authors/title/venue/pages/year) inside the document itself, and if not, you can usually search for the article's title and find the info on the publisher's site. In CS, DBLP also has reasonably good curated bibliographic data, but lower coverage and no full-text search.
Congratulations Caltech on a great move! This will only have positive effects for the future of scholarly publishing.
Best part:
>Faculty may still grant exclusive rights to their publishers, either permanently or for an embargoed period, but to do so, they must request a waiver from the open-access policy. At other institutions with open-access policies, such as MIT and Harvard, faculty have requested waivers for about 5 percent of the total number of papers produced, usually to comply with the requirement of a few publishers that want a formal waiver in order to even consider manuscripts for publication.
5%, and in those cases, because of dinosaur publishers. This is a very conclusive sign for the shift in mentality towards open access.
One nice thing about even having the waiver process, even if it's granted to anyone who requests, is that it adds a slight bureaucratic roadblock that causes some authors to rethink a publication choice they might not have thought much about in the past. You go to publish with $big_journal, the journal sees you're from a university with an open-access mandate and asks you to get an open-access waiver before you can submit a manuscript, and in some percentage of the cases this might cause you to just redirect your manuscript elsewhere. In some cases that might be difficult; some sub-fields don't have as many respected journals as others, or your paper might be something unusual that's a perfect fit for a particular special issue and hard to publish otherwise. But those end up being minority cases.
Having to have signed one such waiver, it's treated much like a standard 'oh sign this so they'll get moving on your paper'. Again, because it's only for the 'nicer' journals, the 5% number is a bit misleading - only to the best research would this apply. And that's the unfortunate fact - it is too often the good stuff that bypasses the policy.
On the flip side, it is a point of friction, if small. It did give me a very specific moment for my complaint to be registered (even if overridden).
This is a good thing in this case, but adding red-tape to intentionally alter incentives is merely one useful tool in regulating markets, not a panacea, and can easily be used, intentionally or not, to work against those aims e.g. of course you're allowed to to use any supplier, just fill in these forms in triplicate or use our preselected choice.
> Indeed, some publishers, seeking to protect their own investment in scholarly work, have authorized third-party agencies to find articles posted in violation of their contractual rights and to issue Digital Millennium Copyright Act takedown notices that threaten legal action if articles are not removed from the web.
Wow! Really? I am a math professor, and in mathematics I have never heard of this happening. It would strike me as professional suicide on the publishers' behalf: holding a knife to the neck of their golden goose.
Although I am in 100% in support of Caltech's policy, I would have guessed that the issue was mostly theoretical and that they were only taking a stand on principle. Either Caltech is bluffing, or I stand corrected. I'm not sure which. But in either case, kudos to them.
I'm a student (and researcher) at Caltech — and very excited to hear about this. I hope Caltech continues to push for open access. I know that many undergraduates would like to have course lectures posted freely online like MIT's open courseware.
I do wonder, how common is such a policy at other institutions? I assume this must be uncommon, since it's apparently newsworthy?
They're becoming more common. The entire University of California system already has a similar policy, of institutional open-access archival by default, but with a waiver process for a handful of cases where the author thinks it's needed: http://osc.universityofcalifornia.edu/open-access-policy/
Unfortunately, few Caltech classes have lectures that are actually worth it to be posted online (unlike the MIT ones, who are IMO far superior in teaching quality, for the most part)
I attended Caltech about 35 years ago. I kept my notes, but looking at them now without recalling the context of the lectures makes them a bit on the incomprehensible side.
I sure wish those lectures had been recorded and I could refresh my memory on the more interesting ones.
I've also seen some of the MIT online videos, and my recollection is the Caltech lectures I attended were of comparable quality.
Indeed. It is becoming less uncommon, especially in the UK where I work with a national policy to require research papers arising from tax payer funded work to be published using OA. Also, the Universities are really leading the way: http://en.wikipedia.org/wiki/Open_access_mandate
It's a big topic at my institution, and here at the Libraries it's constantly being pushed and advocated with faculty and through conferences we play a part in.
At other institutions with open-access policies, such as MIT and Harvard, faculty have requested waivers for about 5 percent of the total number of papers produced, usually to comply with the requirement of a few publishers that want a formal waiver <in order to even consider manuscripts> for publication.
This is the one caveat. It would be great to out these journals. Seems likely is public/501c money going into the research on these, too.
It looks like we'll be able to access the MS-Word looking version for free. For the "styled/typeset" two-column version it looks like people will have to pay for the paywalled version.
At this point I'm left wondering:
1. Is the typeset version that much of a value add? Do people not realize how relatively easy it is to do these things on apart from publishing houses?
2. Is reading a single column version all that bad? We do it on the internet every day.
The unavailability of papers, lack of centralized tools, and terrible search interfaces have been incredibly frustrating. I can't wait for a day when we get a centralized, non-profit, publish/subscribe consumer service for all the papers that ever get published by major research universities (with a good search tool). The value of such a service to the public and society would be enormous. As more and more people get educated and used to dealing with science, this could be as big a deal as wikipedia.
We're not quite there yet, but this is a huge step by CalTech in the right direction.