Hacker News new | past | comments | ask | show | jobs | submit login
PDF Reader in JavaScript (github.com/andreasgal)
221 points by swah on June 15, 2011 | hide | past | favorite | 65 comments



At Crocodoc we tested out a similar approach using the Canvas element to render pages, but ultimately opted to go the @font-face route for a number of reasons, including:

- Near-instant client-side rendering (albeit a one-time ~5 second wait during server-side processing)

- Native font rendering (Canvas rasterizes all text and doesn't benefit from technologies like ClearType)

- Native text selection (which is important for overall UX and our annotation tools such as highlighting)

- Better performance on mobile devices (this is an ongoing project)

Here's a comparison of the test file micheljansen links to (thanks!) compared with the same PDF in Crocodoc:

- Canvas rendering: http://bit.ly/kU0mlW

- Crocodoc rendering: http://bit.ly/je2wcv

Edit: By the way, we're really impressed by this canvas implementation :-)


I had never heard of crocodoc and was very impressed with this rendering. What browsers do you guys support?


IE7 - IE9, Firefox, Chrome, and Safari


So why hasn't Google bought them?


Wow, I'm impressed at how well it renders math. Great job guys!

http://crocodoc.com/EfqW081


The only reason this project is interesting is because it does not use any server side processing (beyond serving).


I expect PDFs could be rendered client-side with @font-face by using data URIs.


Crocodoc looks great! Just out of curiosity: do you have a set of fonts ready to deliver to the client in order to make the font-face approach work? What do you do if the PDF if using some obscure font type? Are you using Poppler for server side processing?


Do you guys also (plan to) build directly from LaTex?


Any plans on supporting Opera? The rendering on 11.11 is a little off, and canvas rendering does not work.


I couldn't find a demo online and was too lazy to deal with Chrome Local Policy madness, so I temporarily put one up here: http://pumpkin.micheljansen.org/~dawuss/pdf.js/test.html

It has some obvious flaws, but it already works surprisingly well!


Christ on a stick. I submitted this and an example 3 hrs ago. sigh. oh well.

http://news.ycombinator.com/item?id=2657412


[deleted]


Yeah, stuff falls off the top of "new" so easily too, since there's so much crap shoveled into it.

Edit: not sure why parent was deleted, it said that timing matters (which it does).


Upvoted for pity, well, at least you got your point across :)


You may wish to edit your comment to be less offensive to Christians and other religious readers. We may not be particularly vocal around here, but we do exist.


No, i don't think i'll edit it (nor do i wish to), but i will apologize for offense i have caused, as that was not my intent.

Edit: I just want to be clear, you see things like "goddamnit" and other ppl using the words "christ" or "god" (in what others may see as taking them in vain) all over the place including on HN. I don't see what the big deal is there either. So really, i do mean it when i say i intended no offense.


Thank you for the apology.

The distinction I would make between your comment and "goddamnit" is that while I don't particularly like either one, "goddamnit" doesn't actively mock God/people's religious beliefs about God. On the other hand "Christ on a stick" certainly seems like going out of one's way to mock the crucifixion/people with religious attachment to the crucifixion.

Does that help explain where I'm coming from?


Yes, that does.

Having thought it over more, i agree that it is trivializing of the crucifixion. However this probably isn't the best venue for ruminating on the notions of Christian symbolism & semiotics.

Suffice it to say, i will retire the phrase.


Go blog about your persecution?


A throwaway account for this comment?


It's probably still an asshole thing to do, but let's at least understand that if he offends you, it's because of something you've chosen to identify with.

Offending people for who they were born -- e.g. ethnicity, gender, sexual orientation -- is wrong and should be marginalized. Saying something that belittles or contradicts the particular explanation you use for understanding how the world works is a) probably unavoidable, and b) elective.

Moreover, "Christ on a stick" only makes sense because Christians have made a concerted effort over the centuries to ensure that when on thinks of American culture, Christian themes will be conjured. Thus, abusing those references while uttering profanities is expected and completely sensible, as long as one is being profane -- it's what comes immediately to mind.


git cloned test.html did not work with latest versions of Firefox, Chrome, Safari for me, yet yours does. I find it strange that there's some policy more restrictive on local content than on remote one, provided yours work perfectly.


Chrome does not allow scripts to access files on a local filesystem unless you specifically tell it to do so by starting it with --disable-web-security as an argument. See http://code.google.com/p/chromium/issues/detail?id=40787 for a discussion (and a lot of frustrated people).


To simply start an arbitrary HTTP server to serve up files in a directory I use python's SimpleHTTPServer. This works on most Linux and Mac boxes as they package python along with the distribution or the OS.

python -m SimpleHTTPServer

This serves the files off the current folder under port 8000.


Something similar for Node.js: https://github.com/balupton/simple-server


Doesnt seem to work on MobileSafari. Using iOS 4.2.1.


Typographic ligatures (character combinations such as ff and fi) are being displayed as gray rectangles in Firefox 4 on Windows 7.


Same with Fx4 on OS X. Chrome 12 skips ligatures entirely (so difficult reads difcult), while Safari 5 just renders a big boxed nothing in lieu of the document. There seem to be some trouble with curly braces around the emails in the header.


I've been having very good results with the Inkscape SVG renderer (inkscape -z input.pdf -l output.svg). Not pixel-perfect, but handles images and vector diagrams pretty well, and text is selectable/searchable. This would be a server-side solution, not client-side.


Sweet. Scribd has an HTML5 document viewer so a PDF viewer is certainly doable. Maybe this will be included someday as a minimal PDF viewer in firefox.

Now all we need is a proper javascript printing api and almost all business application can be done in the browser.


But this is pure client-side. Scribd does server-side generation of the HTML5 page.


Yeah. If this gets more robust and productized, it just might eat Scribd's lunch.


And Adobe's


...not sure. They're not exactly charging for Adobe Reader. If end users or site owners switch to a less bloated viewer, they still keep using Adobe's PDF format.


Here's a blog post by the authors (members of Mozilla's graphics and JavaScript teams) explaining the motivation and direction of the project:

http://andreasgal.com/2011/06/15/pdf-js/


nice technology demo -- I like using google quick view for pdf reading...this is a nice alternative... dreaming up uses for it - immediate display and download of pdf receipts that sort of thing?


I think it's not just the Chrome local policy that's getting in the way, but that the code does an XHR request and I don't think any browser passes that through straight to the filesystem.


Firefox does, if the calling web page is itself a local file and the XHR requests a file in the same directory as or in a subdirectory of that page.



Very interesting. Is there a performance benefit to using "const" instead of "var" to declare variables in JS?


Does this library provide the potential to extract the text out of the PDF for further processing?


Cant find a link to working demo, I'm just too lazy to git clone.



I'd like to see more, too. From the readme, that seems very experimental and not likely to render much of anything yet.


speaking of google ..... so much for Chrome's built in pdf viewer..heh.


Feels a bit like deja vu from the google native client. As it was proven that many, many things are already fast in JavaScript.


The author points out in a blog post that this, being pure .js, doesn't need any sandboxing.

Also, it'll actually be used by FF for rendering PDFs in the future, apparently.


"Fast" is always a relative concept.... http://devongovett.github.com/pdf.js/test.html is about 4x faster in a tip Firefox than a tip Chrome for me.


Correction: It's not "reader", but "writer" (rendering)

There are other libs out there: WPS: PostScript for the Web http://logand.com/sw/wps/index.html jspdf http://code.google.com/p/jspdf/


It is a "reader" from the context of the user -- it is software the user uses to read a PDF. In this context a "writer" would be software you use to author a new PDF file. I think it is safe to say this is the more widely accepted terminology on this sort of thing considering Adobe's own PDF viewer is of course called "Adobe Reader".


I also hoped this submission could trigger a discussion about the code style.

This guy is a Mozilla contributor, probably what other posts mean when they talk about "great programmers", right? Yet, for now, his implementation is a single 3000 line file.

[edited for alternate ending]

So there is this guy who writes a PDF viewer in a 3000 line file, and the guy who writes another simple web app neatly organized in 42 files... Which one would you want on your team?

I wonder how jslinux source code is organized.


3000 lines of spaghetti code is one thing. But this code is pretty-well modularized. The author obviously knows what he's doing.

Maybe he finds it easier to jump around the file that way. I know I've built up projects like that. When it starts to transition from tech demo to real product, I would probably break it up into different files (util.js, lexer.js, etc). But no harm in the meantime.

A simple web app spread across 42 files might actually be terrible and over-engineered. Or it might not. The moral is: don't judge code quality by isolated metrics.


Which one would I want on my team? - the one that produced game changing results. If I need to send in some students with cattle prods and bio-hazard suits after the fact - so be it.

I have seen _many_ important demos and prototypes that were truly ear-bleeding - to the point where I would say it is a common characteristic of this sort of work.


I think this is far from production ready code and this guy just wanted compact code. Perfectly fine for a quick copy/paste demo.

Another thought - this could be the output of his packed code, sort of like what we get when downloading mootools unminified - one big file, but that's not how authors store the code internally.


He's getting stuff done. That's what great programmers do. Also, a PDF viewer isn't simple. If he can write a PDF viewer -- that works with a variety of PDF files -- in a short space of time, I wouldn't really mind how he organizes the code. It's the research behind the code that takes the time here.


what's wrong with a 3000 line js file? if the code is organized well, a good editor with an outline view makes jumping around 3000 lines no problem. actually I would prefer it over multiple files if I feel the code I wrote belongs to that "module".


Perhaps Andreas is waiting on ECMAScript Harmony modules:

https://bugzilla.mozilla.org/show_bug.cgi?id=568953

:)


What are you asking me to agree about? I would never get caught dead working on a 3000 line file, and honestly, that seems like a terrible way to have to add new features.


Some perspective:

    $ wc -l expr.c decl.c
    26638 expr.c
    34860 decl.c


I'm assuming those are from GCC?


No, the Delphi compiler.


Ah, this morning someone told me that if I wanted to get serious about my constant folding I should be aware the GCC impl was XXXX LOC (in the same neighborhood as what you posted) so I assumed :)


Really? Downvoted this much for simply implying that I don't appreciate 3000 line files? I didn't imply that it was bad or wrong, just that I wouldn't prefer to work with that type of style. And I hardly think that a couple of hyper counter examples takes away from my own opinion in coding style.


I apologize for editing my comment - after reading yours - to try to convey my message better. And failing to do that, and making yours kinda senseless. Will avoid that in the future.


Hehe. No problem, I was just generally blown away. Loading the full thread back up makes it more obvious now. Oh well, I get edit happy too!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: