Hacker News new | past | comments | ask | show | jobs | submit login

The standard way for getting the actual PDF from a DOI, when it's a Crossref DOI (which it probably is) is to use the full-text link, available in the CrossRef API.

For DOI 10.1155/2010/963926

http://api.crossref.org/works/10.1155/2010/963926

From the returned JSON message -> link -> there's the PDF!

    [
      {
        intended-application: "text-mining",
        content-version: "vor",
        content-type: "application/pdf",
        URL: "http://downloads.hindawi.com/journals/jo/2010/963926.pdf"
      },
      {
        intended-application: "text-mining",
        content-version: "vor",
        content-type: "application/xml",
        URL: "http://downloads.hindawi.com/journals/jo/2010/963926.xml"
      }
    ]
Publishers are still getting round to including the full-text links in metadata, but there are 16,000,000 DOIs with such data. Not all are open-access however.

When a PDF has Crossref CrossMark, the DOI is embedded in the metadata (I can't say how but I can find out)

http://www.plosone.org/article/fetchObject.action?uri=info:d...

Drop us a line on labs@crossref.org




Thanks, I'd not thought of the crossref api for this. I use the API pretty heavily for other things though, really good work!

Just noticed this part of the response:

    "affiliation": [],
How well filled in is that? I find it's currently a really poorly provided thing on many sites (although there are metatags, they're often wrong).

> When a PDF has Crossref CrossMark, the DOI is embedded in the metadata (I can't say how but I can find out)

That's likely to come in really useful, thanks.


    http://api.crossref.org/works?filter=has-affiliation:true

    => total-results: 964,696
Do ask us questions on labs@crossref.org or raise a ticket on https://github.com/CrossRef/rest-api-doc




Consider applying for YC's first-ever Fall batch! Applications are open till Aug 27.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: