Hacker News new | past | comments | ask | show | jobs | submit login

> The U.S. Department of Energy (DOE) today unveiled … a Web portal that will link to full-text papers a year after they're published. > Open-access advocates such as University of California, Berkeley, biologist Michael Eisen slammed CHORUS when publishers announced the program it last year. They prefer a full-text government archive like PubMed Central so it is possible to "text mine," or search across the entire body of papers. “Under this [DOE] plan, the public's ability to download, text/data mine, and digitally analyze these articles is severely limited,” SPARC’s Joseph agrees. > But Frederick Dylla … says there is little demand for text mining. He says AIP has never gotten a request for its more than 1 million articles; Elsevier, the publishing giant, gets only about six requests a year, he says. Text mining journal articles is “a field that's just beginning," he says.

12 months is a joke in the timescale of scientific research, most papers already went through up to a year of prep time before publishing. I would like to see a plan that isn't just paying lip service to the ideals of open-access. And Eisen is right about data mining, most journals are terrible in that regard. Even the wonderful arXiv.org doesn't provide citation/reference metadata in their API and they are a groundbreaking leader of the movement. I had to write a scraper to map out an arXiv citation network and there are only a few subfields with enough info to do that. Maybe scientists would make more requests if the APIs were better and the citation information wasn’t copyrighted or obfuscated.




I have been meaning to setup an arxiv mirror (they have instructions on how to do this) and then run all the papers through elastic search.

I'd love to run NN on the abstracts or similarity algos on the equations. It would be fun to do SIFT on the equations an then some deep learning to detect branches of mathematics. Or extract molecular symbols in figures.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: