Hacker News new | past | comments | ask | show | jobs | submit login
In memory of Aaron Swartz: a collection of PDFs from PDFtribute (edward.io)
74 points by inconditus on Jan 15, 2013 | hide | past | favorite | 18 comments



An important thing to remember, is that many journals already permit self-archiving of publications (ie. uploading a pre-print to a personal server or an institutional repository). In fact, about 70% of large publishers automatically allow some form of self-archiving, and for the others, many have been successful including a copyright addendum with the copyright-transfer document, retaining some rights (http://scholars.sciencecommons.org/). FAQ on self-archiving (http://www.eprints.org/openaccess/self-faq/).

At my university, we keep running workshops, there are student staff in the library willing to help upload articles to the repository if you just e-mail them, etc, but still, most academics won't take the five minutes to do this, even if they have the right.

This doesn't mean that the academic publishing system shouldn't change, it absolutely should. And there's also a lot of value in "liberating" academic publications that would otherwise not be free. But I hope people would become more aware of what is already possible, and legal!


Agreed, a lot of people don't know their current rights. One major reason is because that information is typically buried in some unintuitive legalese deep in some publisher's website. To work around that problem... this is a very useful database that will allow you to easily check what a journal/publisher allows you to do with your publications: http://www.sherpa.ac.uk/romeo/

Most in my field at least allow you to put postprints (the final version of the paper, but not formatted by the journal's typesetters) online, although there are a few stragglers who don't let you do anything.


As long as these PDFs are exposed publicly (and linked to, which a tweet with or without #pdftribute will take care of), they will mostly be indexed by Google Scholar, which does a decent job of extracting metadata using heuristics etc.

Of course, it would be much better if people started embedding machine-readable metadata in PDFs (totally possible, see for example http://code.google.com/p/pdfmeat/), and if there was some agreed-upon format for bibliographic microformats, that could be embedded in websites listing articles.

We also eventually need an open alternative to Google Scholar. GS is great, and I use it every day (and love that you can output BibTex for example), but it has no API (and will never have one because of deals with publishers), actively resists automatic access, is a black-box in terms of how data is gathered, etc. Think of "Open Scholar" to Google Scholar as analogous to OSM vs GMaps. OSM might not look as pretty, or be as consistent in the beginning, but it enables a whole range of applications that GMaps doesn't. (And at least GMaps does have a fairly good API, even if it charges for overuse, GS has nothing).

(These are just some thoughts I've made, as I've been experimenting with an open scholar workflow, trying to share as much of the "byproduct" of the research, including rich notes and summaries, my own bibliography with links to OA pubs where they exist etc: http://reganmian.net/wiki/researchr:start).

Another thing I've found working on my project, where I try to expose OA links to as many pubs as possible, and regularly rescan to see if they are still available (and still OA), is how quickly documents disappear... Hosting on private pages is convenient, but fragile. Ideally, people would upload papers to university repositories, subject repositories like Arxiv.org, etc.


Thanks for contributing to this thread - I've been looking for something like this for years!


Let me know if you want to discuss further - I've lot's of ideas, but not able to make most of it happen by myself. For example, here are a bunch of (unfinished) notes about an open alternative to Google Scholar, and open social API to share citations/notes, etc. http://reganmian.net/wiki/ideas_for_scrobblr


Cool effort.

But ... its a score to jstor. It's unorganized.

But ... science if full of noise and crappy publications these days anyway. Lots of ways to do the same thing, unprooven and only exists because everybody has to publish to stay relevant.

Now: How to really improve science ? My suggestion: A big python framework for each field of study. That has implementations of the real algorithms and models for comparison and benchmarking and even real life implementation.

See as example in the robotics field, ROS ( Robotics Operating System) . Ros is like a basis glue framework where universities and individuals can publish their code. Its decentralized, it has simulators so that scientists do not need to own the physical robots and can even compare(diff) results and algorithms in a very fast way.

The simulator can have a embedded browser + wiki + quora that explains X.

evolution: physical paper -> PDF -> simulator.


It's not meant to be a competitor to JSTOR, as much as this is a statement in honor of someone.

A framework like that would be awesome, but that has a different meaning from the collection of personal pdf posts/uploads each individual on Twitter contributed.


For someone who has been working on OA and scholarly publications for a few years, it's a bit tricky to enter these kinds of debates. On the one hand, I want to respect Aaron's legacy, and am very touched by the spontaneous and organized moves to honor him. On the other hand, I am very interested in these issues, and love discussing them, seeing how we can do things better.

For smogzer, there is some interesting research on how knowledge from the research literature can be better represented. For example using concept mapping, see this great paper by Simon Buckingham-Shum (who has many others): http://oro.open.ac.uk/6463/1/kmi-04-28.pdf. Anita de Waard has given many presentations on semantic and executable papers, for example http://www.slideshare.net/anitawaard/executing-the-research-....


> But ... its a score to jstor. It's unorganized.

I'd imagine competing against an organization that's been around for 18 years won't take 6-7 hours of coding. :) It's an MVP, one I'm quite embarrassed about. But I'll continue coding. Someone suggested an open Google Scholar approach - that's one direction this project could head in.


Cool stuff, have you seen this yet?

http://pdftribute.net/


I think this website is different from PDFtribute.net because it actually collects and stores the PDFs rather than just having links to the Twitter posts.

From the 'About' section of the website, you can see it uses PDFtribute.net to help scrape links.


I missed that part (the usage of pdftribue.net for scraping), thanks!


We're working with the creator of edward.io to get some kind of integration between the two sites, and also working on a search/index/analysis tool for pdftribute.net


Looking through all the files that are uploaded, there are a lot more non-English documents than I expected; I randomly clicked on 2. It's amazing how there is support from around the world.


It is a great loss to know such an entrepreneur has died because of legal problems. I, myself, have faced similar been in a similar situation. I feel that Aaron was a martyr for the open source of academic papers. Unfortunately he will not see his impact on this modern and technology dependent world.

R.I.P. Aaron Swartz!


Nice short article: 10 things you can do to really support Open Access: http://phylogenomics.blogspot.de/2013/01/10-things-you-can-d...


Nice idea!

Also, some metadata aggregation (title, author, tags, date published) capabilities wouldn't hurt anyone.


in memory of Swartz, 1 million ebooks for free download

http://ebookoid.com




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: