At Crocodoc we tested out a similar approach using the Canvas element to render pages, but ultimately opted to go the @font-face route for a number of reasons, including:
- Near-instant client-side rendering (albeit a one-time ~5 second wait during server-side processing)
- Native font rendering (Canvas rasterizes all text and doesn't benefit from technologies like ClearType)
- Native text selection (which is important for overall UX and our annotation tools such as highlighting)
- Better performance on mobile devices (this is an ongoing project)
Here's a comparison of the test file micheljansen links to (thanks!) compared with the same PDF in Crocodoc:
Crocodoc looks great! Just out of curiosity: do you have a set of fonts ready to deliver to the client in order to make the font-face approach work? What do you do if the PDF if using some obscure font type? Are you using Poppler for server side processing?
You may wish to edit your comment to be less offensive to Christians and other religious readers. We may not be particularly vocal around here, but we do exist.
No, i don't think i'll edit it (nor do i wish to), but i will apologize for offense i have caused, as that was not my intent.
Edit: I just want to be clear, you see things like "goddamnit" and other ppl using the words "christ" or "god" (in what others may see as taking them in vain) all over the place including on HN. I don't see what the big deal is there either. So really, i do mean it when i say i intended no offense.
The distinction I would make between your comment and "goddamnit" is that while I don't particularly like either one, "goddamnit" doesn't actively mock God/people's religious beliefs about God. On the other hand "Christ on a stick" certainly seems like going out of one's way to mock the crucifixion/people with religious attachment to the crucifixion.
Having thought it over more, i agree that it is trivializing of the crucifixion. However this probably isn't the best venue for ruminating on the notions of Christian symbolism & semiotics.
It's probably still an asshole thing to do, but let's at least understand that if he offends you, it's because of something you've chosen to identify with.
Offending people for who they were born -- e.g. ethnicity, gender, sexual orientation -- is wrong and should be marginalized. Saying something that belittles or contradicts the particular explanation you use for understanding how the world works is a) probably unavoidable, and b) elective.
Moreover, "Christ on a stick" only makes sense because Christians have made a concerted effort over the centuries to ensure that when on thinks of American culture, Christian themes will be conjured. Thus, abusing those references while uttering profanities is expected and completely sensible, as long as one is being profane -- it's what comes immediately to mind.
git cloned test.html did not work with latest versions of Firefox, Chrome, Safari for me, yet yours does. I find it strange that there's some policy more restrictive on local content than on remote one, provided yours work perfectly.
Chrome does not allow scripts to access files on a local filesystem unless you specifically tell it to do so by starting it with --disable-web-security as an argument.
See http://code.google.com/p/chromium/issues/detail?id=40787 for a discussion (and a lot of frustrated people).
To simply start an arbitrary HTTP server to serve up files in a directory I use python's SimpleHTTPServer. This works on most Linux and Mac boxes as they package python along with the distribution or the OS.
python -m SimpleHTTPServer
This serves the files off the current folder under port 8000.
Same with Fx4 on OS X. Chrome 12 skips ligatures entirely (so difficult reads difcult), while Safari 5 just renders a big boxed nothing in lieu of the document.
There seem to be some trouble with curly braces around the emails in the header.
I've been having very good results with the Inkscape SVG renderer (inkscape -z input.pdf -l output.svg). Not pixel-perfect, but handles images and vector diagrams pretty well, and text is selectable/searchable.
This would be a server-side solution, not client-side.
Sweet. Scribd has an HTML5 document viewer so a PDF viewer is certainly doable. Maybe this will be included someday as a minimal PDF viewer in firefox.
Now all we need is a proper javascript printing api and almost all business application can be done in the browser.
...not sure. They're not exactly charging for Adobe Reader. If end users or site owners switch to a less bloated viewer, they still keep using Adobe's PDF format.
nice technology demo -- I like using google quick view for pdf reading...this is a nice alternative... dreaming up uses for it - immediate display and download of pdf receipts that sort of thing?
I think it's not just the Chrome local policy that's getting in the way, but that the code does an XHR request and I don't think any browser passes that through straight to the filesystem.
It is a "reader" from the context of the user -- it is software the user uses to read a PDF. In this context a "writer" would be software you use to author a new PDF file. I think it is safe to say this is the more widely accepted terminology on this sort of thing considering Adobe's own PDF viewer is of course called "Adobe Reader".
I also hoped this submission could trigger a discussion about the code style.
This guy is a Mozilla contributor, probably what other posts mean when they talk about "great programmers", right?
Yet, for now, his implementation is a single 3000 line file.
[edited for alternate ending]
So there is this guy who writes a PDF viewer in a 3000 line file, and the guy who writes another simple web app neatly organized in 42 files... Which one would you want on your team?
3000 lines of spaghetti code is one thing. But this code is pretty-well modularized. The author obviously knows what he's doing.
Maybe he finds it easier to jump around the file that way. I know I've built up projects like that. When it starts to transition from tech demo to real product, I would probably break it up into different files (util.js, lexer.js, etc). But no harm in the meantime.
A simple web app spread across 42 files might actually be terrible and over-engineered. Or it might not. The moral is: don't judge code quality by isolated metrics.
Which one would I want on my team? - the one that produced game changing results. If I need to send in some students with cattle prods and bio-hazard suits after the fact - so be it.
I have seen _many_ important demos and prototypes that were truly ear-bleeding - to the point where I would say it is a common characteristic of this sort of work.
I think this is far from production ready code and this guy just wanted compact code. Perfectly fine for a quick copy/paste demo.
Another thought - this could be the output of his packed code, sort of like what we get when downloading mootools unminified - one big file, but that's not how authors store the code internally.
He's getting stuff done. That's what great programmers do. Also, a PDF viewer isn't simple. If he can write a PDF viewer -- that works with a variety of PDF files -- in a short space of time, I wouldn't really mind how he organizes the code. It's the research behind the code that takes the time here.
what's wrong with a 3000 line js file? if the code is organized well, a good editor with an outline view makes jumping around 3000 lines no problem. actually I would prefer it over multiple files if I feel the code I wrote belongs to that "module".
What are you asking me to agree about? I would never get caught dead working on a 3000 line file, and honestly, that seems like a terrible way to have to add new features.
Ah, this morning someone told me that if I wanted to get serious about my constant folding I should be aware the GCC impl was XXXX LOC (in the same neighborhood as what you posted) so I assumed :)
Really? Downvoted this much for simply implying that I don't appreciate 3000 line files? I didn't imply that it was bad or wrong, just that I wouldn't prefer to work with that type of style. And I hardly think that a couple of hyper counter examples takes away from my own opinion in coding style.
I apologize for editing my comment - after reading yours - to try to convey my message better. And failing to do that, and making yours kinda senseless. Will avoid that in the future.
- Near-instant client-side rendering (albeit a one-time ~5 second wait during server-side processing)
- Native font rendering (Canvas rasterizes all text and doesn't benefit from technologies like ClearType)
- Native text selection (which is important for overall UX and our annotation tools such as highlighting)
- Better performance on mobile devices (this is an ongoing project)
Here's a comparison of the test file micheljansen links to (thanks!) compared with the same PDF in Crocodoc:
- Canvas rendering: http://bit.ly/kU0mlW
- Crocodoc rendering: http://bit.ly/je2wcv
Edit: By the way, we're really impressed by this canvas implementation :-)