While nice, it really only works with the reference has a direct link to the PDF...

IanCal · on Oct 26, 2015

DOIs are great for humans, unfortunately they'll take you to the publishers webpage and I don't know of a standard way of getting an actual PDF from a DOI. Maybe with PLOS, I know they're good at serving up different versions (xml with a different accept header iirc).

Searching for something on the page that looks like a download PDF button and trying that might get you 80% of the way there, along with at least giving the user the remaining DOI urls to visit themselves.

This is actually really relevant to a lot of problems I see so if anyone has a general solution / a 90% solution then I'm all ears :)

afandian · on Oct 26, 2015

The standard way for getting the actual PDF from a DOI, when it's a Crossref DOI (which it probably is) is to use the full-text link, available in the CrossRef API.

For DOI 10.1155/2010/963926

http://api.crossref.org/works/10.1155/2010/963926

From the returned JSON message -> link -> there's the PDF!

    [
      {
        intended-application: "text-mining",
        content-version: "vor",
        content-type: "application/pdf",
        URL: "http://downloads.hindawi.com/journals/jo/2010/963926.pdf"
      },
      {
        intended-application: "text-mining",
        content-version: "vor",
        content-type: "application/xml",
        URL: "http://downloads.hindawi.com/journals/jo/2010/963926.xml"
      }
    ]

Publishers are still getting round to including the full-text links in metadata, but there are 16,000,000 DOIs with such data. Not all are open-access however.

When a PDF has Crossref CrossMark, the DOI is embedded in the metadata (I can't say how but I can find out)

http://www.plosone.org/article/fetchObject.action?uri=info:d...

Drop us a line on labs@crossref.org

IanCal · on Oct 26, 2015

Thanks, I'd not thought of the crossref api for this. I use the API pretty heavily for other things though, really good work!

Just noticed this part of the response:

    "affiliation": [],

How well filled in is that? I find it's currently a really poorly provided thing on many sites (although there are metatags, they're often wrong).

> When a PDF has Crossref CrossMark, the DOI is embedded in the metadata (I can't say how but I can find out)

That's likely to come in really useful, thanks.

afandian · on Oct 26, 2015

    http://api.crossref.org/works?filter=has-affiliation:true

    => total-results: 964,696

Do ask us questions on labs@crossref.org or raise a ticket on https://github.com/CrossRef/rest-api-doc

NeatoJn · on Oct 26, 2015

Maybe consider a Google Scholar integration? Its search results include links to full-text pdf at times. Even not, extracted links to publisher websites could be helpful for a batch review of referenced articles.

afandian · on Oct 26, 2015

Not wishing to nit-pick, but a DOI is a link to the publisher website.

aroch · on Oct 26, 2015

afandian's reply below is exactly how I would go about it. Most DOI's in science papers are crossmark OR convertable to a crossmark DOI

metachris · on Oct 26, 2015

Good point, and I will definitely take a look at that!

Would you open an issue on Github, and perhaps reference a few papers?

aroch · on Oct 26, 2015

Sure, I'll do it this evening