Hacker News new | past | comments | ask | show | jobs | submit login

While nice, it really only works with the reference has a direct link to the PDF while the majority of citations use the DOI in the sciences.

DOI traversal would be required




DOIs are great for humans, unfortunately they'll take you to the publishers webpage and I don't know of a standard way of getting an actual PDF from a DOI. Maybe with PLOS, I know they're good at serving up different versions (xml with a different accept header iirc).

Searching for something on the page that looks like a download PDF button and trying that might get you 80% of the way there, along with at least giving the user the remaining DOI urls to visit themselves.

This is actually really relevant to a lot of problems I see so if anyone has a general solution / a 90% solution then I'm all ears :)


The standard way for getting the actual PDF from a DOI, when it's a Crossref DOI (which it probably is) is to use the full-text link, available in the CrossRef API.

For DOI 10.1155/2010/963926

http://api.crossref.org/works/10.1155/2010/963926

From the returned JSON message -> link -> there's the PDF!

    [
      {
        intended-application: "text-mining",
        content-version: "vor",
        content-type: "application/pdf",
        URL: "http://downloads.hindawi.com/journals/jo/2010/963926.pdf"
      },
      {
        intended-application: "text-mining",
        content-version: "vor",
        content-type: "application/xml",
        URL: "http://downloads.hindawi.com/journals/jo/2010/963926.xml"
      }
    ]
Publishers are still getting round to including the full-text links in metadata, but there are 16,000,000 DOIs with such data. Not all are open-access however.

When a PDF has Crossref CrossMark, the DOI is embedded in the metadata (I can't say how but I can find out)

http://www.plosone.org/article/fetchObject.action?uri=info:d...

Drop us a line on labs@crossref.org


Thanks, I'd not thought of the crossref api for this. I use the API pretty heavily for other things though, really good work!

Just noticed this part of the response:

    "affiliation": [],
How well filled in is that? I find it's currently a really poorly provided thing on many sites (although there are metatags, they're often wrong).

> When a PDF has Crossref CrossMark, the DOI is embedded in the metadata (I can't say how but I can find out)

That's likely to come in really useful, thanks.


    http://api.crossref.org/works?filter=has-affiliation:true

    => total-results: 964,696
Do ask us questions on labs@crossref.org or raise a ticket on https://github.com/CrossRef/rest-api-doc


Maybe consider a Google Scholar integration? Its search results include links to full-text pdf at times. Even not, extracted links to publisher websites could be helpful for a batch review of referenced articles.


Not wishing to nit-pick, but a DOI is a link to the publisher website.


afandian's reply below is exactly how I would go about it. Most DOI's in science papers are crossmark OR convertable to a crossmark DOI


Good point, and I will definitely take a look at that!

Would you open an issue on Github, and perhaps reference a few papers?


Sure, I'll do it this evening




Consider applying for YC's first-ever Fall batch! Applications are open till Aug 27.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: