PDF Reader in JavaScript

rdamico · on June 15, 2011

At Crocodoc we tested out a similar approach using the Canvas element to render pages, but ultimately opted to go the @font-face route for a number of reasons, including:

- Near-instant client-side rendering (albeit a one-time ~5 second wait during server-side processing)

- Native font rendering (Canvas rasterizes all text and doesn't benefit from technologies like ClearType)

- Native text selection (which is important for overall UX and our annotation tools such as highlighting)

- Better performance on mobile devices (this is an ongoing project)

Here's a comparison of the test file micheljansen links to (thanks!) compared with the same PDF in Crocodoc:

- Canvas rendering: http://bit.ly/kU0mlW

- Crocodoc rendering: http://bit.ly/je2wcv

Edit: By the way, we're really impressed by this canvas implementation :-)

swah · on June 15, 2011

I had never heard of crocodoc and was very impressed with this rendering. What browsers do you guys support?

peterlai · on June 15, 2011

IE7 - IE9, Firefox, Chrome, and Safari

swah · on June 15, 2011

So why hasn't Google bought them?

tomp · on June 15, 2011

Wow, I'm impressed at how well it renders math. Great job guys!

http://crocodoc.com/EfqW081

pbreit · on June 15, 2011

The only reason this project is interesting is because it does not use any server side processing (beyond serving).

bfirsh · on June 15, 2011

I expect PDFs could be rendered client-side with @font-face by using data URIs.

fbnt · on June 16, 2011

Crocodoc looks great! Just out of curiosity: do you have a set of fonts ready to deliver to the client in order to make the font-face approach work? What do you do if the PDF if using some obscure font type? Are you using Poppler for server side processing?

swah · on June 15, 2011

Do you guys also (plan to) build directly from LaTex?

lordlarm · on June 15, 2011

Any plans on supporting Opera? The rendering on 11.11 is a little off, and canvas rendering does not work.

micheljansen · on June 15, 2011

I couldn't find a demo online and was too lazy to deal with Chrome Local Policy madness, so I temporarily put one up here: http://pumpkin.micheljansen.org/~dawuss/pdf.js/test.html

It has some obvious flaws, but it already works surprisingly well!

knowtheory · on June 15, 2011

Christ on a stick. I submitted this and an example 3 hrs ago. sigh. oh well.

http://news.ycombinator.com/item?id=2657412

on June 15, 2011

[deleted]

knowtheory · on June 15, 2011

Yeah, stuff falls off the top of "new" so easily too, since there's so much crap shoveled into it.

Edit: not sure why parent was deleted, it said that timing matters (which it does).

micheljansen · on June 15, 2011

Upvoted for pity, well, at least you got your point across :)

joshuacc · on June 15, 2011

You may wish to edit your comment to be less offensive to Christians and other religious readers. We may not be particularly vocal around here, but we do exist.

knowtheory · on June 15, 2011

No, i don't think i'll edit it (nor do i wish to), but i will apologize for offense i have caused, as that was not my intent.

Edit: I just want to be clear, you see things like "goddamnit" and other ppl using the words "christ" or "god" (in what others may see as taking them in vain) all over the place including on HN. I don't see what the big deal is there either. So really, i do mean it when i say i intended no offense.

joshuacc · on June 15, 2011

Thank you for the apology.

The distinction I would make between your comment and "goddamnit" is that while I don't particularly like either one, "goddamnit" doesn't actively mock God/people's religious beliefs about God. On the other hand "Christ on a stick" certainly seems like going out of one's way to mock the crucifixion/people with religious attachment to the crucifixion.

Does that help explain where I'm coming from?

knowtheory · on June 15, 2011

Yes, that does.

Having thought it over more, i agree that it is trivializing of the crucifixion. However this probably isn't the best venue for ruminating on the notions of Christian symbolism & semiotics.

Suffice it to say, i will retire the phrase.

GetOberIt · on June 15, 2011

Go blog about your persecution?

swah · on June 15, 2011

A throwaway account for this comment?

paisawalla · on June 15, 2011

It's probably still an asshole thing to do, but let's at least understand that if he offends you, it's because of something you've chosen to identify with.

Offending people for who they were born -- e.g. ethnicity, gender, sexual orientation -- is wrong and should be marginalized. Saying something that belittles or contradicts the particular explanation you use for understanding how the world works is a) probably unavoidable, and b) elective.

Moreover, "Christ on a stick" only makes sense because Christians have made a concerted effort over the centuries to ensure that when on thinks of American culture, Christian themes will be conjured. Thus, abusing those references while uttering profanities is expected and completely sensible, as long as one is being profane -- it's what comes immediately to mind.

karolist · on June 15, 2011

git cloned test.html did not work with latest versions of Firefox, Chrome, Safari for me, yet yours does. I find it strange that there's some policy more restrictive on local content than on remote one, provided yours work perfectly.

micheljansen · on June 15, 2011

Chrome does not allow scripts to access files on a local filesystem unless you specifically tell it to do so by starting it with --disable-web-security as an argument. See http://code.google.com/p/chromium/issues/detail?id=40787 for a discussion (and a lot of frustrated people).

lisperforlife · on June 15, 2011

To simply start an arbitrary HTTP server to serve up files in a directory I use python's SimpleHTTPServer. This works on most Linux and Mac boxes as they package python along with the distribution or the OS.

python -m SimpleHTTPServer

This serves the files off the current folder under port 8000.

bergie · on June 15, 2011

Something similar for Node.js: https://github.com/balupton/simple-server

stephth · on June 15, 2011

Doesnt seem to work on MobileSafari. Using iOS 4.2.1.

tiddchristopher · on June 16, 2011

Typographic ligatures (character combinations such as ff and fi) are being displayed as gray rectangles in Firefox 4 on Windows 7.

lloeki · on June 16, 2011

Same with Fx4 on OS X. Chrome 12 skips ligatures entirely (so difficult reads difcult), while Safari 5 just renders a big boxed nothing in lieu of the document. There seem to be some trouble with curly braces around the emails in the header.

bnewbold · on June 16, 2011

I've been having very good results with the Inkscape SVG renderer (inkscape -z input.pdf -l output.svg). Not pixel-perfect, but handles images and vector diagrams pretty well, and text is selectable/searchable. This would be a server-side solution, not client-side.

Gerdus · on June 15, 2011

Sweet. Scribd has an HTML5 document viewer so a PDF viewer is certainly doable. Maybe this will be included someday as a minimal PDF viewer in firefox.

Now all we need is a proper javascript printing api and almost all business application can be done in the browser.

windsurfer · on June 15, 2011

But this is pure client-side. Scribd does server-side generation of the HTML5 page.

skimbrel · on June 15, 2011

Yeah. If this gets more robust and productized, it just might eat Scribd's lunch.

webXL · on June 15, 2011

And Adobe's

dualogy · on June 15, 2011

...not sure. They're not exactly charging for Adobe Reader. If end users or site owners switch to a less bloated viewer, they still keep using Adobe's PDF format.

mbrubeck · on June 16, 2011

Here's a blog post by the authors (members of Mozilla's graphics and JavaScript teams) explaining the motivation and direction of the project:

http://andreasgal.com/2011/06/15/pdf-js/

ChrisArchitect · on June 15, 2011

nice technology demo -- I like using google quick view for pdf reading...this is a nice alternative... dreaming up uses for it - immediate display and download of pdf receipts that sort of thing?

bennytheshap · on June 15, 2011

I think it's not just the Chrome local policy that's getting in the way, but that the code does an XHR request and I don't think any browser passes that through straight to the filesystem.

pcwalton · on June 15, 2011

Firefox does, if the calling web page is itself a local file and the XHR requests a file in the same directory as or in a subdirectory of that page.

tomp · on June 15, 2011

Example rendere document: http://devongovett.github.com/pdf.js/test.html

neovive · on June 16, 2011

Very interesting. Is there a performance benefit to using "const" instead of "var" to declare variables in JS?

singingfish · on June 16, 2011

Does this library provide the potential to extract the text out of the PDF for further processing?

albb0920 · on June 15, 2011

Cant find a link to working demo, I'm just too lazy to git clone.

devongovett · on June 15, 2011

http://devongovett.github.com/pdf.js/test.html

wccrawford · on June 15, 2011

I'd like to see more, too. From the readme, that seems very experimental and not likely to render much of anything yet.

ChrisArchitect · on June 15, 2011

speaking of google ..... so much for Chrome's built in pdf viewer..heh.

Lennie · on June 15, 2011

Feels a bit like deja vu from the google native client. As it was proven that many, many things are already fast in JavaScript.

starwed · on June 16, 2011

The author points out in a blog post that this, being pure .js, doesn't need any sandboxing.

Also, it'll actually be used by FF for rendering PDFs in the future, apparently.

bzbarsky · on June 16, 2011

"Fast" is always a relative concept.... http://devongovett.github.com/pdf.js/test.html is about 4x faster in a tip Firefox than a tip Chrome for me.

johnx123 · on June 16, 2011

Correction: It's not "reader", but "writer" (rendering)

There are other libs out there: WPS: PostScript for the Web http://logand.com/sw/wps/index.html jspdf http://code.google.com/p/jspdf/

georgemcbay · on June 16, 2011

It is a "reader" from the context of the user -- it is software the user uses to read a PDF. In this context a "writer" would be software you use to author a new PDF file. I think it is safe to say this is the more widely accepted terminology on this sort of thing considering Adobe's own PDF viewer is of course called "Adobe Reader".

swah · on June 15, 2011

I also hoped this submission could trigger a discussion about the code style.

This guy is a Mozilla contributor, probably what other posts mean when they talk about "great programmers", right? Yet, for now, his implementation is a single 3000 line file.

[edited for alternate ending]

So there is this guy who writes a PDF viewer in a 3000 line file, and the guy who writes another simple web app neatly organized in 42 files... Which one would you want on your team?

I wonder how jslinux source code is organized.

jkkramer · on June 15, 2011

3000 lines of spaghetti code is one thing. But this code is pretty-well modularized. The author obviously knows what he's doing.

Maybe he finds it easier to jump around the file that way. I know I've built up projects like that. When it starts to transition from tech demo to real product, I would probably break it up into different files (util.js, lexer.js, etc). But no harm in the meantime.

A simple web app spread across 42 files might actually be terrible and over-engineered. Or it might not. The moral is: don't judge code quality by isolated metrics.

samlittlewood · on June 15, 2011

Which one would I want on my team? - the one that produced game changing results. If I need to send in some students with cattle prods and bio-hazard suits after the fact - so be it.

I have seen _many_ important demos and prototypes that were truly ear-bleeding - to the point where I would say it is a common characteristic of this sort of work.

karolist · on June 15, 2011

I think this is far from production ready code and this guy just wanted compact code. Perfectly fine for a quick copy/paste demo.

Another thought - this could be the output of his packed code, sort of like what we get when downloading mootools unminified - one big file, but that's not how authors store the code internally.

RuadhanMc · on June 15, 2011

He's getting stuff done. That's what great programmers do. Also, a PDF viewer isn't simple. If he can write a PDF viewer -- that works with a variety of PDF files -- in a short space of time, I wouldn't really mind how he organizes the code. It's the research behind the code that takes the time here.

y0ghur7_xxx · on June 15, 2011

what's wrong with a 3000 line js file? if the code is organized well, a good editor with an outline view makes jumping around 3000 lines no problem. actually I would prefer it over multiple files if I feel the code I wrote belongs to that "module".

dangoor · on June 15, 2011

Perhaps Andreas is waiting on ECMAScript Harmony modules:

https://bugzilla.mozilla.org/show_bug.cgi?id=568953

:)

drivebyacct2 · on June 15, 2011

What are you asking me to agree about? I would never get caught dead working on a 3000 line file, and honestly, that seems like a terrible way to have to add new features.

barrkel · on June 15, 2011

Some perspective:

    $ wc -l expr.c decl.c
    26638 expr.c
    34860 decl.c

kingkilr · on June 15, 2011

I'm assuming those are from GCC?

barrkel · on June 15, 2011

No, the Delphi compiler.

kingkilr · on June 16, 2011

Ah, this morning someone told me that if I wanted to get serious about my constant folding I should be aware the GCC impl was XXXX LOC (in the same neighborhood as what you posted) so I assumed :)

drivebyacct2 · on June 15, 2011

Really? Downvoted this much for simply implying that I don't appreciate 3000 line files? I didn't imply that it was bad or wrong, just that I wouldn't prefer to work with that type of style. And I hardly think that a couple of hyper counter examples takes away from my own opinion in coding style.

swah · on June 15, 2011

I apologize for editing my comment - after reading yours - to try to convey my message better. And failing to do that, and making yours kinda senseless. Will avoid that in the future.

drivebyacct2 · on June 15, 2011

Hehe. No problem, I was just generally blown away. Loading the full thread back up makes it more obvious now. Oh well, I get edit happy too!