Hacker News new | past | comments | ask | show | jobs | submit login
Pdf.js: PDF Reader in JavaScript (mozilla.github.com)
371 points by pykello on Sept 14, 2012 | hide | past | favorite | 90 comments



I'm really excited for this for http://www.sharelatex.com and other similar sites that are actually generating a PDF for you. With native PDF viewers there is no way to interact with the viewer via javascript and even just having the viewer stay on the same page when your reload a document (with minor changes) is impossible. Pdf.js means that we'll be able to do this easily, as well as other cool things like letting letting the user sync between the PDF and source.


Our web-based IDE[1] does exactly this, including syncing between source and PDF, using PDF.js. Works great for the most part!

[1] http://rstudio.org


Are you guys actually generating PDFs from user input? If so, I'd be really interested in finding out how you do it if you don't mind sharing. I've found a number of apps that allow you to generate PDFs server side but often the installation is near impossible (dependency issues), e rendering is not good enough, or the API makes using it with a web app clunky. I did find a solution (don't recall the name) that actually used Webkit to render web pages and then turned them into PDFs but I was never able to get it to work in an app. It only seemed to work some of the time and only through ssh. But if you can't say, then I can respect that.


(Just saw this comment, sorry for not replying earlier)

We use LaTeX to generate the PDF. You could try pandoc if you're going from HTML (or markdown) to PDF.


nice, this is quite a sophisticated solution! have your heard of airxcell? they're doing something similar (with spreadsheets).


Hadn't heard of airxcell, looks interesting. Thanks!


I used to use PDF.js for a while (on Linux), until I switched to KParts because it was having difficulty rendering certain kinds of PDF documents. KParts uses the same underlying engine that powers Okular (KDE's default PDF reader.) It renders everything properly and is much faster than PDF.js. It reminded me of Foxit on Windows. KParts might be only available on Linux though...


I use this a lot, and it really does work. It renders everything, and renders it well. The one thing that doesn't work is maps – just too many vectors, and Javascript/Canvas/etc just can't keep up. Otherwise I'm very happy and don't feel nearly as much resentment towards PDFs as I used to.


(I assume you are you mentally comparing pdf.js to Adobe Reader.)

Have we reached the point where Adobe Reader is so resource-guzzling that people prefer a JavaScript (gasp!) PDF interpreter to Reader? Wow.


Yes, I really love PDF.js. I was consider switching to Chrome just because of the (proprietary) pdf viewer. But the Chrome PDF viewer even lacks the concept of pages! I had already kicked out the Adobe Reader plugin but using an external viewer every time makes PDF just so more annoying.

Mozilla is also working on something similar for Flash: https://github.com/mozilla/shumway . I'm a bit sceptical but with the success of PDF.js I can't wait to use it.


Another Ian here, also using pdf.js. Been working on a paperless office app that allows contracts to be signed on an iPad. It works surprisingly well.


Firefox already bundles the pdf.js reader. See https://bugzil.la/714712.


Two questions:

1) In Firefox 15, the demo page adds two new options to my right click menu: Rotate clockwise and Rotate Counter-clockwise. Is Firefox recognizing pdf.js (since it appears that they are related) or pdf.js adding menu options? I didn't know JS could do that.

2) Isn't Javascript an embeddable language inside PDFs? I'm pretty sure I read that javascript is used, not necessarily for animations but for run-time dynamic layouts. If that's true, is pdf.js "eval"-ing that javascript?



Thanks.

I've seen custom contextmenus generated on pages[0] that override the built-in context menu but for some reason I've yet to come across a page[1] that adds to the existing context menu until today.

[0] replacing example: http://developer.yahoo.com/yui/examples/menu/tablecontextmen...

[1] appending example: https://bug617528.bugzilla.mozilla.org/attachment.cgi?id=554...


>but for some reason I've yet to come across a page that adds to the existing context menu until today.

the reason is likely because browser compatibility for the context menu js api is pretty minimal [1]. firefox can use it on pages that are only going to be displayed in firefox, but for the internet at large it's pretty useless.

[1]http://caniuse.com/menu


What about Google Docs? Their context menu seems to work cross browser (even on ie9!). Maybe my assumption that they're using HTML5 for this is wrong but if I'm right then what you see on Gdocs is a complete replacement of the regular menu which is pretty cool but also quite annoying for me at times. Please correct me if I'm wrong (which is very likely).


Google docs is not appending stuff to the browser's context menu. they're catching the right-click action and replacing the standard context menu with their own one drawn in HTML.


This is super cool.

While I can't imagine myself using it anytime soon, it's clear that web applications are improving at a far faster rate then native applications and, with t large enough, the first derivative means that web will eclipse native.

This seems like an academic exercise at the moment; it's to prove that you can replicate a native experience only.

However, it seems that this could be vastly improved by playing to the strengths of the internet. The only online apps that have beat native ones so far have been because of cloud storage and collaboration. First, use filepicker.io or something so this can open my online files. Second, bake some collaboration into it.


GroupDocs currently provide an app for online annotation and collaboration, including accessing your files from different cloud storage providers, currently Azure & Amazon S3 are supported http://groupdocs.com/apps/annotation/try-it-now


XSS injections on these are gonna be fun..


I came across this a few months ago while trying to implement a solution for turning HTML into PDFs server side. This is definitely cool and useful but it's usefulness is limited for now as native PDF readers on the desktop are preferable. Even on iOS the built in reader is nicely done. Chrome on Windows and Mac always opens PDFs in a tab and handles it well I think. That said, this can definitely be of use in Chromebook type situations. I'm sure it'll end up in Firefox OS too which I have hi expectations for. The awesome thing about Firefox OS is that it's all JavaScript and good old fashioned web technologies under the hood so this will fit right in.

So alas, I'm still searching for an easy way to convert HTML to PDF server or client side. I haven't looked at the code yet but I do wonder if one could get that functionality out of this if they wrestled with it enough. (I know there are other ways to turn HTML to PDF but a client or server side script to do so really is the best solution for my situation).


This is about PDF to HTML conversion.


Right, I got that. But when I found it I was hoping it could work in the opposite direction.


As a big user of PDF.js, I have to say it's great for basic PDF documents. However, complex vectors don't render nicely with this


Now if someone would just do this for .docx, .xlsx, and such I'd be set.


GroupDocs provides a commercial online viewer and API's for supporting those formats as well as .pdf


That is awesome! link for the lazy: http://groupdocs.com/


http://groupdocs.com/ seems to cope with a whole bunch of file formats.


Is there a way you could tell your browser to open those links in google docs by default?


Proxying all your documents to Google is something that should be avoidable.


True. And possibly undesirable depending on the sensitivity of the files.


Maybe Web Intents?


I have first seen this a year ago (when it was publicly announced, IIRC). It was a cute tech demo but easily broken, and quite slow.

This ... is mind blowing.


Just as interesting, in my mind, is the inverse library -- jspdf [1] lets you create pdfs in javascript. For automatic document generation, I find I can quickly whip something up in jsbin or jsfiddle that will give me a pdf I can download and do whatever I want with.

[1] http://jspdf.com/


I presume that copy to clipboard could be added to this as well, yes? Cool project.


I have this error with Firefox 9.0.1:

currentPage is undefined http://mozilla.github.com/pdf.js/web/viewer.js Line 285


I'd go with "That's because any version of Firefox below 15 is unsupported".


You should update to Firefox 15. It's better than 9, and secure too.


i've stumbled upon PDF.js a while ago because i was looking for js tool that allows me to extract data from pdfs... sadly i'm still looking for a good lib to do just that.


Apache Tika. Works on many filetypes and languages.


If your looking for a web based API for manipulating PDF's, SaaSpose provide a commercially supported solution


iText has a lot of tools for extracting info from PDFs. We use it extensively.


I have used iText(sharp) for several years. It works very well.


Yeah we use the C# version as well, I've done some hacking on a Clojure wrapper around the java version. it's nontrivial :P


This is great, I love it. I can read PDF files without leaving the browser, in any browser. I find it slightly distracting to switch to a third-party application.


from someone who has worked a lot with PDF, excellent work.


It's in Firefox since version Firefox 14, but disabled by default. Activable with "preview in Firefox" in options/filetypes/pdf.


Next up, a good ePub reader for use in firefox and firefoxos. Breaking free of the only two rendering engines in use...


On an iOS device with retina display it's awfully blurry, probably just not rendering to a big enough canvas.


Feels like they aren't taking window.devicePixelRatio into account so that they end up with a legitimately retina-sized canvas... My canvases were always blurry like this until I took dpr into account.


This is amazing, but sadly slow.


On linux, anything is better than that crappy 32-bit only propietary plugin that hangs the browser for a second every time I want to open a pdf file.


Works nicely since it is Firefox's default viewer. No more need to install a one, yay!


Sweet. Going to keep this in mind for when my site needs a pdf viewer. Awesome tool


This is pretty cool! Anyone knows if there is support for PDF annotations?


Looks like it has basic support for annotations, but is limited in its scope. I'm hoping I can use this to read annotations from PDFs so I can hook it into a journal paper reading/organisation workflow.


The demo looks really impressive, well done. Adding this to my toolbox! :)


On Firefox OS this will be the default PDF viewer.


Good Job!!


1) From a technical perspective, this is damn cool - exceedingly well done. Color me very impressed.

2) I hope I never actually see anyone using this on a website, attempting to make things "easier." Between Scribd and Slideshare, and Adobe trying to force its hideous crash-prone plugins into my browser, there are already enough people making a mess out of what is one of the more well-thought-out aspects of OS X. Give me a link to a PDF, which Preview.app handles in wonderful fashion any day.

3) It would make a sweet browser plugin on browser-in-a-box platforms and other platforms that don't have a nice native implementation (which upon further reading seems to be the goal).


Personally I wish all sites used this instead of scribd/slideshare or opening up external programs.

This will just work on different browsers, different OS etc.

Plus you can send links like this: http://mozilla.github.com/pdf.js/web/viewer.html#page=6&... To send a user to s specific part of the page. Right now usually you just send the pdf link and have to say, "check out the image on page 6".

Keeping the pdf in the browser and accessible to and from javascript opens up a world of possibilities.


> This will just work on different browsers, different OS etc.

For values of "different" that mean "as long as they all have fast processors and accelerated JavaScript". I tried this in a non-Apple browser on my iPad, which means no JITted Javascript interpreter, and it basically hung. I haven't even attempted it on my smartphone, which does have a native PDF-reading application.


I think Firefox has been shipping with this for awhile to view PDFs. It may be an about:config value you have to turn on though.


I only noticed it in FF15, but it may have been there for longer. Change pdfjs.disabled to false in about:config (followed by a browser restart) if you want to turn it on. Here's a test file[0] should you wish to test it out.

I've been using pdf.js for a while on my Windows box (where it is enabled by default on the FF beta channel) and it has performed remarkably well. Sometimes it's a bit slower than Adobe Reader or messes up colour or formatting, but on the whole I've quite enjoyed using it.

[0]: https://svn.torproject.org/svn/projects/design-paper/tor-des...

Note that the text in [0] doesn't scale properly, and the page previews in the navigation pane just show garbled text. Usable, but only just.


Sweet! (It voids the non-existent FF warranty though...)

What's better is, it allows the (awesome) extension Pentadactyl to click the buttons in the viewer (since it's part of the browser). Have to click on (or tab to?) to the document itself though, in order to be able to use keyboard navigation on it though :-/


I've been using it for a few weeks now, and have noticed that it:

- renders very well

- is extremely slow on large documents with images, but pretty good with text-only ones

- offers a more integrated experience than Adobe's plugin

I think it's very promising, and I look forward to seeing it become more polished.


It is included, but off in release versions at the moment.

In about:config :

name: pdfjs.disabled status: default type: boolean value: false


I'm no usability expert, but to me it would make more sense to name that parameter "pdfjs.enabled" true/false


about:config isn't really optimized for usability and there's some advantage to the double negative in that the absence of a value could be treated as a false boolean value.

So if at one point pdfjs should be enabled by default, they can just get rid of the preference in the default preferences file altogether instead of having to ship one where it says enabled: true (because absence of the value would be treated as false)


Does that mean I can just remove the Firefox extension installed from https://addons.mozilla.org/en-US/firefox/addon/pdfjs/ and then turn on this setting will enable a built-in PDF viewer?


>Give me a link to a PDF, which Preview.app handles in wonderful fashion any day.

Really? I can't stand that. My download folder ends up cluttered with (inevitably weirdly named) PDF files. Worse, I suddenly am switching between programs and tabs, rather than just tabs, and the two programs have very different search interfaces (and Preview.app's is downright clumsy). Since I rarely look at a PDF if I'm not researching something, navigation and searching are really important considerations.

I'm quite happy with the Chrome PDF viewer though.


For me it put anything in my download folder. It opens in the browser in a reasonable, non-resource-abusive fashion. I'm on Safari 6/Mountain Lion, so I'm not sure what version this happened.


OK, that's new in Mountain Lion apparently (Lion was the OS X that chased me away). I assumed it was the same as in previous versions since you mentioned Preview.app, but it appears that Preview.app isn't involved anymore when you open links in Safari; it just uses a built-in PDF viewer the same way Chrome does.


A few months after this was first shown on HN Chrome started shipping with a built-in pdf reader that looks exactly the same on both MacOS and Linux.

I always assumed it was pdf.js


No, Chrome has its own binary plugin. IIRC it is closed source and not available in Chromium.

Pdf.js is developed by Mozilla and now shipped with Firefox and it works quite well so far.


That's correct. I use Chromium on Arch Linux and it doesn't come with the proprietary pdf reader by default.


You said "by default" which is correct but just for the sake of completeness, the built in Chrome PDF reader can easily be added to Chromium on Arch using this package from the AUR: https://aur.archlinux.org/packages.php?ID=44148

I've been using it for almost a year now without issue.


Me too! In fact, that's why I said "by default" :)

Similarly, Pepper Flash Player can also be integrated from AUR.


There is also a Google Chrome/Chromium extension also provided by the pdf.js team. (It's not in the webstore, though.)

I've used it since about half a year and I like it a lot. (Not having to download PDFs all the time and having PDFs in tabs alongside the sites I read.)


I agree. Separate windows for PDFs is terrible.

You can also separately download the proprietary PDF reader of Chrome and use it with Chromium. If you are using Arch Linux, it is available in the repositories.


I believe all the other replies to your comment to be incorrect.

Both Chrome and Android use the skia (http://code.google.com/p/skia/ ) library as far as i am aware (see http://code.google.com/searchframe#OAMlx_jo-ck/src/third_par... )


This is not a PDF reader but a graphics library.



That code is for drawing to a pdf.


I believe Chrome's pdf plugin is based on Foxit


I found Preview.app always annoying. E.g., the maximize windows button not maximizing the window. Sure it is better than Adobe Reader. But there are quite a few very nice alternative PDF viewers better than it. And pdf.js is great as a browser plugin because opening PDFs won't disturb your browsing any more.


I agree about Preview - but mainly I just hate having to download PDFs to read them. Most of the time I don't need them long-term.


I wouldn't say it's exceedingly well done... yet. It is so slow it is unusable on my OC'd core i5, and can't search pages you haven't looked at yet. The last one is a deal breaker for me. Really the only reason I use chrome for my day to day browser is for the PDF plugin. Once this project matures I'm out.


How well does printing work when PDF.js handles the rendering?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: