Hacker News new | past | comments | ask | show | jobs | submit login
Introducing The Paper Bay (jacquesmattheij.com)
232 points by david927 on Feb 13, 2013 | hide | past | favorite | 90 comments



Could you allow the requester to provide a link to the pay wall where they get stuck? People who have access to the paper through a university subscription could then easily click on that link and then immediately be taken to the paper. I imagine most people looking for a paper will already have come across this paywall and so will know where to find it. It would significantly reduce the overhead of the person fulfilling the request, but may be against the spirit of the site.

In many cases that I have come across, the DOI can be turned into a link that takes you to the article. Perhaps that would be enough.


> In many cases that I have come across, the DOI can be turned into a link that takes you to the article. Perhaps that would be enough.

This would be a very simple but extremely helpful change. You only need to prepend http://dx.doi.org/ in front of the DOI, and make the resulting link clickable.


Ok, I'll add that tonight.

edit: done.


You could also add the various other efforts in Open Access domain that were linked on this page to your http://thepaperbay.com/others/ page. There was at least Pirate University, articleak, and /r/scholar.

Also, while it's okay to have a somewhat complexe form to fill when making a request, it's really sad that it's as much work to share a paper, it should be very easy, in particular when clicking "I have this, send it", the form should have only one field: the file upload, or at least have the other prefilled with the request's values.


With the exception of old and obscure papers (i.e., 15 years or more and not cited in the last 5 years), I don't really see what the big problem is. I have yet to mail an author and not receive a pdf of the paper, usually within days, sometimes within minutes. When you have a reference and Google, you're seconds away from finding that author's current affiliation & email address. I once emailed the author of a book that retailed for 150$+ on some Elsevier subsidiary to ask for a few pages. Turns out the book was basically her PhD thesis (many, many books are like that) and she send me the pdf of that thesis a few hours after I emailed her. All this brouhaha over access to papers is a storm in a glass water if you ask me, instigated mostly by those who don't even need the papers but rather are looking for a big meany to pick an ideological fight with. Maybe it differs per field, I don't know.


I think I'm right in saying that by emailing you a copy of the PDF, many authors are breaking their contract with the publisher. It's a few years since I published an academic paper but I've certain signed over my rights more times that I care to recall. So while you can get a copy of the paper from the author officially that is not allowed. Thankfully most academics are not dickheads about this.

The key problem with journals is that much of what they do has been made obsolete over the last couple of decades by the move to electronic document distribution and the wonders of the WWW. Don't forget we are talking about private companies restricting the publication of academic research much of which has been funded out of the public coffers. Even the peer review is done, for free by academics. All that is left is the management of the process and distribution of the publications both of which I'd wager could be handled by the communities themselves as beautifully and repeatedly demonstrated by open source software projects.

What the journals still retain is essentially kudos, but in a very real sense as academics are largely measured against their publication performance of which the 'currency' is the impact factor of the Journal. Academics need to publish in those Journals with a high impact factor so that their departments get a good rating and thus get more government funding (at least here in the UK). The key will be for a community lead Journal to attain a decent impact factor. Soon.


    "breaking their contract with the publisher"
I don't believe this is true, generally. of course, it depends exactly what they email you, but if it is a pre-print, that is, the PDF you compiled with LaTeX on your laptop, then there is normally no problem.

The publishers only own the copyright to the particular typesetting of the paper (which was originally the useful service they provided academics) and as such, without their special-sauce theme that adds the journal name and page numbers on, it's fine for you to distribute it, host it online for free, etc. (Incidentally, this is also the case with sheet music, which is why it's legal to enter a score into a MIDI program and print it out, but not to photocopy it.)

As I say, this is probably not a universal fact, and will depend on what the terms of the journal submission was, and indeed for the book mentioned there was probably a specific contract at some point. Although, if there were small changes between the book and the thesis, the author was probably within her rights to email (and indeed host online) the thesis for free.


So why don't we make an effort to publish these sorts of pre-prints?


Some people are making an effort. The ones that spring to mind are

http://arxiv.org/

http://eprint.iacr.org/

And Google Scholar often turns up pre-prints, making them much easier to find:

http://scholar.google.com/


Sure, but the kind of 'paper sharing' services such as proposed by the OP have the same problem. Many journals do allow you to share your own article, although I don't know the details by hard nor would I even know where to start looking for the conditions of publishing of my own papers.

Your last two paragraphs illustrate my point, or rather the myopic viewpoint that these 'paper liberators' advocate - nobody outside of the HN crowd I know (and yes, this is limited to a few fields) even cares about this supposed 'stranglehold'. Well everybody grumbles every now and then, but at least nobody cares enough to really push for change, and the things people grumble about (slow editors, idiotic formatting requirements, ...) are mostly not solved by 'open access' journals. Everybody has access to their universities' libraries, or contacts the authors themselves, or has some research assistant dig them up for them. Yes yes the subscriptions cost the universities real money, but come on, much more money is wasted by universities on other trivial stuff. Look, I'm not saying the model is perfect, it's just that (IMO) it's not a big enough problem for actual researchers (as opposed to 'interwebs poasters with strong opinions about things they have little use for') to really matter.


I admit I'm not saying that ThePaperBay necessarily offers the right solution but I am refuting your claim that there isn't really a problem. The issue here is that there has always been a problem. Academic output should always have been available to anyone who wanted it _at a fair price_. Until electronic documents became so ubiquitous you could tollerate the constraints of publisher ownership because they helped overcome real problems of communication and physical distribution. The question is haven't we now reached the point where these problems have gone away?


...it's not a big enough problem for actual researchers (as opposed to 'interwebs poasters with strong opinions about things they have little use for') to really matter.

This division between researchers and everyone else is exactly the kind of elitism that broader access to research is supposed to reduce. People fighting for open access aren't necessarily responding to an existing widespread demand, but anticipating the future benefits to society.

They are trying to be ahead of the curve, somewhat like the oft maligned RMS was ahead of the curve with his understanding of and fighting against DRM.


The "trick" is to provide a so called draft instead of the paper you have in the IEEE/ACM format, I wrote so called because it's usually the exact same content with a different formating. The only paper you are legally not allowed to share is the one with the publisher's template (LNCS/IEEE/ACM...). If you go to researcher's web page you'll find that most of them just upload a version without using the IEEE/ACM template on their public page to be on the safe side. However many people don't really care and upload the pdf from the proceeding even when they're not supposed to ;)


Good luck with emailing authors of papers which are 70 years old. In mathematics good results never outdate, so there is no rule of thumb "if it is still relevant, author must be still alive".


But those perennial papers are usually so widely spread that a simple google search will bring them up, or otherwise somebody in the department will have a copy on their hard disk; so finding it is as simple as asking 'hey who has xyz' on the internal mailing list.

Maybe it's different in mathematics, I don't know, I'm just stating my experiences. Few fields are as long-lasting as mathematics, too, I think; in most fields, only a hand full of papers is still relevant after 20 years.


It isn't as simple, as a perennial paper in mathematics does not have to be even famous. It may have even not cites (yet). And I'm speaking of my experience (theoretical physics). Moreover, then you can hardly expect that one of a bunch of guys at your department has the paper.

Sure, for applied (or even - experimental) sciences it is different.


How would you know about a paper you don't have and doesn't have any cites and is 50 years old? How can a paper be perennial if it doesn't have any cites? A paper being perennial is, I'd argue, defined by getting cites on a regular basis even after many years.

Who does theoretical physics / mathematics and doesn't have access to a university library? And don't say Ramanujan, people in those circumstances wouldn't have access to the internet, either.


> How would you know about a paper you don't have and doesn't have any cites and is 50 years old?

Google Scholar?

> How can a paper be perennial if it doesn't have any cites?

It was a bit exaggerated. But _in mathematics_ typical span of citation accumulation is decades, not years. And typical total citation count is way lower than in, say, biology.

> A paper being perennial is, I'd argue, defined by getting cites on a regular basis even after many years.

No. http://www.thefreedictionary.com/perennial

Don't confuse it with "popular" or even "with lasting popularity".

Again, in mathematics things (almost) do not age...


"But _in mathematics_ typical span of citation accumulation is decades, not years. And typical total citation count is way lower than in, say, biology."

Sure, I'll accept that. My point is that some people must have it apart from the archives of the university it was first published at. Again, I'm talking about the actual, practical issues here, not the "what might happen". Not to turn this into an ad hominem, but are you an academic? How often do you have real problems (as opposed to 'annoyed because I have to spend 15 minutes') finding the content of papers?

"No. http://www.thefreedictionary.com/perennial Don't confuse it with "popular" or even "with lasting popularity"."

By that definition, anything written is perennial. In the context of a book/movie/paper being 'perennial', 'perennial' means 'still after a relatively long amount of time enjoys some form of popularity or following'. Just because a dictionary doesn't define it into that nuance, doesn't make it not true.


I know of at least one person who is doing lots of science / scientific projects/experiments for themselves and does not have access via uni libraries etc. but needs to consult scientific literature (as in, academic papers) fairly often. (Granted, the case does not fall within the aforementioned 'theor.phys. / mathematics' field.) They use http://reddit.com/r/scholar to ask for papers to be uploaded (it's a very nice and useful subreddit for those folk, I try to fill out some requests now and then).

Of course, 'knowing one person' is anecdotal / talk about bad representative samples, but I can attest to the fact that those people exist. (I don't know the details why that person in question cannot register under a public library; public libraries may not have subscriptions to their journals of interest perhaps.)

In any regard, there are cases where old theories / research were dug up by contemporary researchers who realized that those original models were useful for their modern research etc. [citation needed, could dig something up, but googling is probably more effective than asking in this case.]


But he probably does it because it's just easier than getting a subscription to a university library. Proving my point exactly - how hard is it really to get access to academic papers? Most authors can't even find people that want to read their stuff. This whole 'I can't access research' is really 'there is no website where the 0.1% of people who don't have main stream and side stream access to journals can download them from'. Hardly the big threat to society it is sometimes made out to be.

Let me state again that I too would sometimes prefer one huge easily accessible database with all articles ever published, along with cites and H-factors and objective impact factor rankings and a bookmarking/personal library feature and maybe a pony too. Then again, the magnitude of that problem is miniscule compared to other problems I have I'd much rather see solved, or spend my time on. And my experience says most of the people in my work field feel the same.


I generally agree. At the same time, it's good to have tools for such edge cases - there are some papers in niche journals etc. which play a part in science but may not be easily attainable (again, because e.g. public libraries do not pay for subscriptions for those journals and whatnot.) But I agree with the gist of what you're saying (trying to come up with a universal user friendly system might be an overkill); but it is good to have these frustrations and debate in the open, and again, I really like that there is such a place as /r/scholar.


But, again, those that can ask on the mailing list, also have the university subscription to go through pay walls. People outside investigation groups are left outside the party.

You are not making a convincing argument that papers are already "free enough". Unless you assume all interested parties are from the academic world, that is...


"Unless you assume all interested parties are from the academic world, that is.."

But seriously, how many people aren't? Yes yes there is always that one guy in his attic or log cabin... Most university libraries offer subscriptions to externals, too (mine does for less than 50 USD a year). So it's really only for people who are interested in academic materials, who are outside of any reach of a university (because really all you have to do is become member of a university library to get access to the electronic materials from the comfort of your own home). How many people fit that criterium? I'm arguing that overall, the problem is mostly in people's minds.


I work in the private sector with no academic affiliations. I find I use various IEEE/scientific papers roughly once a month. Some companies are small, and won't pay for things like papers. Some companies are large and have crappy bureaucracies/"cost saving measures" for getting a $15 paper off the internet. I'm not saying I need it all to be free, because I do always end up with the paper I want, I'd just like to add an anecdote that it's not all academia. But you're mostly right, access is not too much of a problem.


> Most university libraries offer subscriptions to externals, too (mine does for less than 50 USD a year).

Can you show me one that does? You're saying as a non-institutionally affiliated individual, I can pay a university library $50/year and get access to papers? Please, which school offers this, or what search terms should I be using? please. times a googleplexamonium.


Many, if not most, universities in the US offer something like this, from what I've seen. The one issue though, is that "non affiliated" users don't always get the same kinds of access as students / professors, at least in terms of accessing things from off-campus.

So, if you, for example, went and paid the UNC-CH library your $25.00 fee for a "borrower's card"[1], you don't get the ability to access all the various digital databases and what-not from your home, but you can drive down to the library, use the computer there, and access basically everything anyone else can. Or you can go down and ask the reference librarian to hunt down a paper for you and get you a paper copy.

On a semi-related note... I'm not sure what other states and jurisdictions have something like this, but here in NC we have something called "NC Live"[2], which is a portal that provides access to all sorts of online digital resources (including many which would otherwise be fairly expensive) to anyone with a library card from pretty much any county/city library in the whole state.

[1]: http://www.lib.unc.edu/circ/borrowers.html

[2]: http://www.nclive.org


One reason to promote more open and universal access to academic materials is to allow journalists and interested laypeople to read the actual papers referenced in news articles and press releases, deciding for themselves whether the press release was accurate. The idea is that more exposure to how research actually works will improve general scientific literacy and put an end to the constant barrage of states wanting to outlaw the teaching of evolution or cosmology.

Having a few academics e-mail an author to ask for a copy might be acceptable. Having thousands of educated laypeople constantly e-mailing authors simply doesn't scale.

Another reason is to permit anyone and everyone to do wide-scale analytics, instead of having to rely on the analyses that big, well-connected entities decide to run.


Are you serious with your comment or trying to be provocative intentionally?

Have you tried to do active research without access to a university journal subscription? I have to ssh into an on-campus box at least two three times per day to get a paper. I won't necessarily read the whole thing, but glancing at it certainly helps me see what it is about much more than a one paragraph summary.

If I didn't have that I would be much less productive.


Indeed, most papers have an electronic copy available from the authors bio page, at least in CS.


It certainly doesn't apply to anything medical.


Presuming you can speak English.


If you don't, how are you going to read a paper in English?


I submitted articleak, which something very similar a few days ago, but it didn't get upvoted. Maybe people on Hacker News do not like Tor?

The idea of using Tor is that the site is way harder for authorities to get down, and even if they do break Tor and get to the server, the users will still be entirely protected (there can't be any information about them in the logs or anything).

link: http://articleak.allalla.com/

submission: http://news.ycombinator.com/item?id=5175234


Whether you get upvoted or not has as much to do with time of day of submission and random factors as how much people like it.


We have a thriving, helpful community already up and running at http://www.reddit.com/r/Scholar/.


While I agree with the "manifesto", the solution seems to be not that much radical. There are many places where people can upload their papers.

Moreover, a web1.0 interface does not seems to be efficient.

In that line there is already: http://www.pirateuniversity.org/

And... judging by it's name (i.e. The Paper Bay) I would expect something much more radical, actually aiming at getting 75TB of _any_ paper content...


Give it some time ;)


Sure. Especially as from links there I'm aware that you are aware of other solutions. :)

So, a reliable, automated and anonymous (?) way to upload books/papers seems to be a must. Or do you have other plans?


It's a bit confusing that the site (ThePaperBay.com) has the same theme as your blog.


This is because they're both using http://octopress.org/


That doesn't really make it any less confusing


Curious... have you seen /r/scholar? I've requested a few papers from /r/scholar (under throwaways usually), and they've always obliged


Even within acadaemia, if you don't have access to a paper, it's common enough to ask your friends at other institutions if they can get it for you. We're mostly happy to share knowledge, and we don't feel much solidarity with the publishers who charge £30 for one PDF.


I never knew that Aaron was in the lineage of the guy behind the Mark Williams C Compiler.


Yes.

I was working at MWC when Aaron was born. I think I met Aaron once when he was two or three, just briefly. I knew him from stories told by others, and eventually by following his writings.

MWC was quite an interesting company in its own right.


I have fond memories of the compiler. I bought it when my roommate got an Atari ST in college. Very well documented and I learned a lot from the C source of emacs that came bundled with it. I wish I had that source to browse again today.


It was a very good example of dogfooding--the document reflected what the guys working on the compiler and toolchain would want from such a compiler.

I believe the emacs that you are talking about is MicroEmacs, written by Dave Conroy while working at MWC. The source has since been made available generally, and the editor is the one that Linus uses. Daniel Lawrence later took over the distribution and here is one location: http://www.aquest.com/emacs.htm

I used it for quite a while, but then fell into Emacs and didn't look back.

Oh--I did get it to run on my HP200lx and used it there for a while.


Reddit has a channel for paper requests. Might be a useful place for 'much needed features'

http://www.reddit.com/r/Scholar


Great effort! Please add https://peerj.com/ and http://www.mendeley.com/ to your list of resources.


I'll add those tonight, thanks!

Ok. done.


Very good.

But I'd love to have just seen the pirate bay duplicate itself and to create a site exclusively for academic papers. Seeding of obscure papers might be an issue, but I'm sure that it would work quite well.


When you download a paper from JSTOR or ScienceDirect, it is quite easy for them to add various watermarks to the PDF, and they already do (mildly). It might make it harder to scale this concept if the provider can track down who initially downloaded the paper (and presumably shared it).


  > When you download a paper from JSTOR or ScienceDirect, it
  > is quite easy for them to add various watermarks to the
  > PDF, and they already do (mildly).
You can strip the watermarks with a tool called pdfparanoia.

https://github.com/kanzure/pdfparanoia

disclaimer: I wrote pdfparanoia because watermarks.


request several incarnations and junk the diff


That's fine if you're technical enough to even know what that means; I'm worried about the people working outside the tech realm who could be unaware even of the personal risk when they try to help out.


I got bored reading the post so started to look around the website. I think the Domains for Sale page (http://www.jacquesmattheij.com/auction-of-domains-for-sale/) only clarifies my initial thoughts on this person.


Or you could have a look at this page instead; http://jacquesmattheij.com/thank-hn-our-friend-is-safe-and-s...


... how does selling off unused domains relate to this at all?


I hear he also picks his nose!

I don't care if the guy smuggles cocaine on the side, this specific thing he's doing seems like a really useful service.


The filling a request process appears to be more complex than it is due to superfluous form fields. The information about authors, title and DOI has already been entered by the requester. For the service to be a viable alternative, it should be (and appear to be) extremely simple to fill a request.

Tracking down a paper is time consuming enough, even if you are sitting on the network with access to most journals. Don't make it look like people have to type in additional information manually.


At articleak (see my other comment), we only ask for the link to the paywall, and then if we could not get it from there (we make the http request via Tor for obvious reasons so it may fail some times) we ask you for the title of the paper, and that's all. So in the worst case you have to fill in two fields, but in most case, only one: the link to the paywall.

However, I would not be as hard as you seem to be with the pirate would made The Paper Bay. When you really need a paper and you already lost half an hour searching for it in the web ocean, it's okay to take 2 more minutes instead of 30 seconds to ask for it on such a service.


Your article is very well worded. I like the idea, but I am not sure if it is reasonable for people "on the inside" to spend time fulfilling requests.

Could there be a js solution to this? A volunteer inside the paywall could leave a tab open which periodically goes to get requests from the main paperbay queue and tries to fulfill them (you would need some robot-like functionality like find the PDF link). If success, it uploads the PDF. If fail (no subsc?), it can notify paperbay to put the request back in the queue.

With one requests every 10 minutes and 1000 volunteers you could have a 6000 paper/ hour rate of exodus.

BTW? Where are you hosted? How long do you think before they come for you if it gets big?

________________

PS: The protocol could be extended a bit it we need to handle captchas' as well. These really piss me off because I can't get the paper via ssh to campus + elinks!

R: requestor F: friend (inside paywall)

   R-->tpb.com     doi:10000x200 PLZ
   tpb.com:        resolve doi:10000x200, prepare scrape recipe.
   tpb.com-->F     could you get jrnl.com/yr/issue/33131/
   F-->jrnl.com    GET ... 
   F<--jrnl.com    CAPTCHA.jpg  +  form el
   R<--tpb.com<--F solve plz ( CAPTCHA.jpg ,  form el )
   R-->tpb.com-->F form ans
   F-->jrnl.com    captcha form submit 
   R<--tpb.com<--F<--jrnl.com    PAPER.pdf


Interesting. It would be much more useful if the papers in the 'store' would be visible to everyone to see and request.

I wonder though how this ' we will deliver the paper on behalf of that email address.' will be important to challenge any copyright violation claims...


I have trouble with someone who is also a domain squatter: http://www.jacquesmattheij.com/auction-of-domains-for-sale/


Weird that he didn't use openpapers.com / open-papers.com


That word does not mean what you think it means:

http://en.wikipedia.org/wiki/Cybersquatting


perhaps it should have magnet links and torrent files like archive.org and an API to upload, .bib-s and RDF-s in the url bar for the likes of zotero.

http://arxiv.org/help/api/index http://archive.org/about/faqs.php#Archive_BitTorrents


The site does not host or link to copyrighted work. I think he thought of a way to make this happen without breaking the law (himself).


That's a great idea - but way to bury the lede there.

(And it just occurred to me that that's journalisms way of saying tl;dr. Except I did read it, just to find what the heck The Paper Bay is :)


Nice idea. Perhaps evolving this to include raw research data could be interesting... e.g. may I please have a) your research paper and b) the 20Gb of trial data you used to come to your conclusions. With this I'll run my own experiments, expanding on your trials and hopefully adding to the wealth of knowledge you so wonderfully kick started. Cheers.


Yeah sure, here you go. Sincerely, Ms Kate Awesome, Phd


"Store" makes it sound like I should be buying something. Perhaps change the name to "Upload"


Thanks a lot, this is very nice and helpful. However... Aren't you afraid of going to jail?


Is this legal?


It's just a question guys, not a moral judgement.


This is a bit of a grey area that we deliberately avoided with PDFTribute.net; it seems perfectly acceptable to aggregate and index metadata, just as long as you're not actually storing any data.


First request and got the paper already!


My first request also and got the paper after an hour ! Who said, good people are rare to find :)


I guess this will come in handy: https://github.com/kanzure/pdfparanoia


I'm wondering if you could do something like this automatically, so users with access to journals could download a little program, and then other users could send queries via a P2P program, which would end up at nodes with access to journals.

There could be various settings to ensure each node with access to journals throttle the amount of downloads.


I kind of hoped for an actual torrent tracker rather than central site that can be easily taken down. But never mind :)


The UI is not very clean but I love the idea. I wish there was a browse feature. Some of us that aren't in college just enjoy enlightening our brain and would love to see what's already up on the site.


Good luck :)


You're a really good guy, jacques.

And I had no idea about Mark Williams C.


What is stopping the server from being taken down by legal pressure?


Already featuring a request for "cocaine," no author given.

Relying on the benevolence of others may only go so far...


Not any more...

Of course the 'abuse angle' was taken into account, it's a bit annoying but any user facing service will have that aspect.


Get this on Tot/i2p.

It'll never go down...


I agree with you on Tor and/or i2p. I don't know if you can be as categorical as "It'll never go down", but at least it will be significantly harder to get it down. And it can be anonymous, which is important of course for the people offering the service, but also to protect the users of the service…

I don't know if such a website can survive to the scientific publishers' lobby, especially when its builder is publicly known.

Edit: it seems this exists, as another comment point out on this discussion page.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: