I apologize for being slightly off-topic. I think an open-source PDF -> HTML 5 renderer would be extremely useful and potentially disruptive. I suspect it would produce lots of experimentation into the right way to read, annotate, and manage documents.
This wheel has been reinvented now at a few different companies. Is anyone aware of any projects actively building an open source canvas renderer?
While it would be great if crocodoc open-sourced their renderer, I'm sure their business model rests (partially) on the difficulty for others to enter this space.
PDF viewing isn't natively supported by any browser as far as I'm aware-- plugins/extensions are required, which makes the content significantly less accessible.
Frankly, it makes sense to print Printable Document Format documents while showing a user HTML and taking advantage of HTML's features, notably javascript and net use. Of course Adobe put both of those features in, but it's not at the same quality that a native browser implementation can provide.
If you prefer plugin-based reader then it could always be used as a fallback option for those users without plugin. That is if you just need vanilla viewer. Embeddable and configurable viewer would be gold if you want to build more advanced functionality on top of PDF documents.
Annotation benefits mostly when you view the document again, which means your browser needs to reserve the file for some time like a month or longer, this does not sound like everyone will do.
It's nice to see so many of these tools that solve smaller problems really well being released. I was just considering implementing something like this for a later version of my own application, and now I can probably just use their API when I'm ready for this feature, assuming prices are reasonable. Everybody wins.
I am working in a web agency, and people are definitely very excited here by this product, but maybe not to use it exactly as intended: we have lots of clients who have paper magazine and who would like to create an iPad version.
Currently, the tools that I have seen convert every page into an image, and create an application from it (sometimes overlaying some links over the images). It gives poor usability (impossible to select text, or to change the font size for some part of the text). We would definitely be very interested in a tool that could take our documents in pdf and give us back the html5 code so that we can edit it manually (embedded videos instead of images, adding some interactive feature) before packaging it as an iPad app.
There is definitively a market for tools allowing to do things like that. I am wondering if Crocodoc has plans in this field.
Ryan from Crocodoc here. It is fair to say that we're not the first non-Flash document viewer -- there are some really solid image-based viewers like Vuzit who have been around a lot longer than we have.
What we are the first to do is leverage new HTML5/CSS3 standards like SVG and text transformations to create a fully embeddable document viewer that renders text natively and has built-in commenting, markup, and PDF export functionality.
You can compare the two different approaches used on the same document here:
They do offer PDF export functionality and annotations, but yes, I can see where crocodoc has some nice advantages, possibly. I'm not associated with Vuzit, but I am a user, so I don't necessarily know how they would respond to this.
Here is one of the most complex PDFs we've come across so far, at least in terms of the typography and number of crazy fonts: http://crocodoc.com/KhoD84?embedded=true
That's pretty impressive! However, I normally browse with "Allow web-pages to choose their own fonts" disabled (since 99% of websites have terrible font choices), and in this situation a lot of the text (most of page 37, or the block on the right-hand-side of page 2) is rendered as gibberish; it looks like it's using code-points from the Unicode private-use area. I guess that's probably what the original PDF source document was doing, but it misses the point of rendering things into HTML - even with font-changing enabled, I bet copy/paste doesn't work too well.
Is it possible to ensure all text uses its original code-points, even at the expense of exact reproduction, or is this just a fundamental problem with the technology?
Wow! A very impressive UI. Love the UX of the download feature. I hope these guys go far—they're certainly on the right path. The only thing I would have like to have seen is hover states on the buttons.
This looks absolutely amazing.
Even works great on Powerpoint Slideshows.
I'm curious, is there anyway to programatically change the currently viewed page? I assume you could just call a 'click' event on the PageDownBtn, but it would be awesome to have some sort of API for making a slideshow presentation....
HTML document viewer - isn't this a browser?
To add the social stuff, what about a browser plugin.
This sounds like a joke.
Why having another window in a window?
What do I do when I want to put a presentation online? I publish a website.
With all due respect, I know that a lot of work goes into these YC companies, but how is this ground breaking? Not simply for that fact that this seems like a feature the Scribd provides but also that fact that this is clearly not the next Google. We're seeing a lot of YC companies and startup founders in general creating these non-compelling web 2.0 features and 'apps' getting a lot of buzz and a descent amount of funding. This is not good for the startup ecosystem as a whole. The VC and angel investors who would have provided some kind of stop gap to weed out the next Pets.com have all drank the proverbial Kool-aid and are tripping over themselves to invest in these companies. This is creating a bubble scenario. This will hurt YC and in the long term. Please let us return back to sanity.
Ryan from Crocodoc here. The Adobe Acrobat product line alone is a $600M+/year business. We believe that we can expand upon and disrupt that industry just like Gmail did to Outlook and Google Docs is doing to Microsoft Word. So if you ask us, we definitely have big plans for Crocodoc.
The HTML5 canvas tag is basically a Cairo surface minus a few functions. That means there's a lot of desktop software that can finally be moved to the web. It also means the entry barrier is fairly low. This is great for disrupting desktop software businesses and destroying the market.
I agree with you that it's still not clear whether the valuations for these new companies are reasonable.
If I had a bunch of money right now, I'd be buying under-valued desktop software properties and moving them onto the web.
This wheel has been reinvented now at a few different companies. Is anyone aware of any projects actively building an open source canvas renderer?
While it would be great if crocodoc open-sourced their renderer, I'm sure their business model rests (partially) on the difficulty for others to enter this space.