World's first embeddable HTML5 document viewer, from Crocodoc (YC W10)

chwahoo · on Feb 16, 2011

I apologize for being slightly off-topic. I think an open-source PDF -> HTML 5 renderer would be extremely useful and potentially disruptive. I suspect it would produce lots of experimentation into the right way to read, annotate, and manage documents.

This wheel has been reinvented now at a few different companies. Is anyone aware of any projects actively building an open source canvas renderer?

While it would be great if crocodoc open-sourced their renderer, I'm sure their business model rests (partially) on the difficulty for others to enter this space.

mckoss · on Feb 16, 2011

I don't get it. Why do you need to embed something that is natively supported in the browser?

stevedewald · on Feb 16, 2011

PDF viewing isn't natively supported by any browser as far as I'm aware-- plugins/extensions are required, which makes the content significantly less accessible.

teraflop · on Feb 16, 2011

Chrome is packaged with a PDF plugin, which is as close to "native support" as makes no difference IMO.

TREYisRAD · on Feb 16, 2011

It works quite well, too.

jasonkester · on Feb 16, 2011

We use Scribd's viewer inside of Twiddla so that users can pull up documents inside of whiteboard sessions and collaboratively scribble on them.

I'm sure there are plenty of other tools like ours that need something similar.

windsurfer · on Feb 17, 2011

Frankly, it makes sense to print Printable Document Format documents while showing a user HTML and taking advantage of HTML's features, notably javascript and net use. Of course Adobe put both of those features in, but it's not at the same quality that a native browser implementation can provide.

deno · on Feb 16, 2011

If you prefer plugin-based reader then it could always be used as a fallback option for those users without plugin. That is if you just need vanilla viewer. Embeddable and configurable viewer would be gold if you want to build more advanced functionality on top of PDF documents.

benatkin · on Feb 16, 2011

As someone said in the HN comments for the TechCrunch article, CrocoDoc uses their implementation of a document viewer to support annotation.

alexanderswang · on Feb 16, 2011

Annotation benefits mostly when you view the document again, which means your browser needs to reserve the file for some time like a month or longer, this does not sound like everyone will do.

guptaneil · on Feb 16, 2011

It's nice to see so many of these tools that solve smaller problems really well being released. I was just considering implementing something like this for a later version of my own application, and now I can probably just use their API when I'm ready for this feature, assuming prices are reasonable. Everybody wins.

madewulf · on Feb 17, 2011

I am working in a web agency, and people are definitely very excited here by this product, but maybe not to use it exactly as intended: we have lots of clients who have paper magazine and who would like to create an iPad version.

Currently, the tools that I have seen convert every page into an image, and create an application from it (sometimes overlaying some links over the images). It gives poor usability (impossible to select text, or to change the font size for some part of the text). We would definitely be very interested in a tool that could take our documents in pdf and give us back the html5 code so that we can edit it manually (embedded videos instead of images, adding some interactive feature) before packaging it as an iPad app.

There is definitively a market for tools allowing to do things like that. I am wondering if Crocodoc has plans in this field.

asnyder · on Feb 16, 2011

Definitely not the first. Vuzit has been offering a non-flash document viewer for years. http://www.vuzit.com.

rdamico · on Feb 16, 2011

Ryan from Crocodoc here. It is fair to say that we're not the first non-Flash document viewer -- there are some really solid image-based viewers like Vuzit who have been around a lot longer than we have.

What we are the first to do is leverage new HTML5/CSS3 standards like SVG and text transformations to create a fully embeddable document viewer that renders text natively and has built-in commenting, markup, and PDF export functionality.

You can compare the two different approaches used on the same document here:

http://www.vuzit.com/blog/2010/10/vuzit-now-supports-tight-i... (see embedded viewer under bullet #3)

http://crocodoc.com/uzvIYS

asnyder · on Feb 16, 2011

They do offer PDF export functionality and annotations, but yes, I can see where crocodoc has some nice advantages, possibly. I'm not associated with Vuzit, but I am a user, so I don't necessarily know how they would respond to this.

bane · on Feb 16, 2011

Link to a sample doc:

http://crocodoc.com/CaDZXS?embedded=true

drop the "embedded=true" to see it with their interface.

Beautiful stuff!

rdamico · on Feb 16, 2011

Here is one of the most complex PDFs we've come across so far, at least in terms of the typography and number of crazy fonts: http://crocodoc.com/KhoD84?embedded=true

thristian · on Feb 16, 2011

That's pretty impressive! However, I normally browse with "Allow web-pages to choose their own fonts" disabled (since 99% of websites have terrible font choices), and in this situation a lot of the text (most of page 37, or the block on the right-hand-side of page 2) is rendered as gibberish; it looks like it's using code-points from the Unicode private-use area. I guess that's probably what the original PDF source document was doing, but it misses the point of rendering things into HTML - even with font-changing enabled, I bet copy/paste doesn't work too well.

Is it possible to ensure all text uses its original code-points, even at the expense of exact reproduction, or is this just a fundamental problem with the technology?

bane · on Feb 17, 2011

Do you OCR PDFs as well? I deal with a ton of non-OCR'd scanned documents.

bloudermilk · on Feb 16, 2011

Wow! A very impressive UI. Love the UX of the download feature. I hope these guys go far—they're certainly on the right path. The only thing I would have like to have seen is hover states on the buttons.

benatkin · on Feb 16, 2011

I like how they provided a live example in the second page.

stavros · on Feb 16, 2011

And which doesn't work in Opera :/

harisenbon · on Feb 17, 2011

This looks absolutely amazing. Even works great on Powerpoint Slideshows.

I'm curious, is there anyway to programatically change the currently viewed page? I assume you could just call a 'click' event on the PageDownBtn, but it would be awesome to have some sort of API for making a slideshow presentation....

Andi · on Feb 16, 2011

HTML document viewer - isn't this a browser? To add the social stuff, what about a browser plugin. This sounds like a joke. Why having another window in a window? What do I do when I want to put a presentation online? I publish a website.

nextparadigms · on Feb 16, 2011

Didn't Scribd do this first?

ashamedlion · on Feb 16, 2011

Their HTML5 viewer currently only works on their website, not embeddable on your own site.

They embed a flash app.

Groxx · on Feb 16, 2011

>... Scribd’s embeddable document reader still uses Flash ...

Probably not for much longer, though.

budu3 · on Feb 16, 2011

With all due respect, I know that a lot of work goes into these YC companies, but how is this ground breaking? Not simply for that fact that this seems like a feature the Scribd provides but also that fact that this is clearly not the next Google. We're seeing a lot of YC companies and startup founders in general creating these non-compelling web 2.0 features and 'apps' getting a lot of buzz and a descent amount of funding. This is not good for the startup ecosystem as a whole. The VC and angel investors who would have provided some kind of stop gap to weed out the next Pets.com have all drank the proverbial Kool-aid and are tripping over themselves to invest in these companies. This is creating a bubble scenario. This will hurt YC and in the long term. Please let us return back to sanity.

rdamico · on Feb 16, 2011

Ryan from Crocodoc here. The Adobe Acrobat product line alone is a $600M+/year business. We believe that we can expand upon and disrupt that industry just like Gmail did to Outlook and Google Docs is doing to Microsoft Word. So if you ask us, we definitely have big plans for Crocodoc.

sedachv · on Feb 16, 2011

The HTML5 canvas tag is basically a Cairo surface minus a few functions. That means there's a lot of desktop software that can finally be moved to the web. It also means the entry barrier is fairly low. This is great for disrupting desktop software businesses and destroying the market.

I agree with you that it's still not clear whether the valuations for these new companies are reasonable.

If I had a bunch of money right now, I'd be buying under-valued desktop software properties and moving them onto the web.

ThomPete · on Feb 16, 2011

Will it embed the correct fonts?

betageek · on Feb 16, 2011

If it does embed the correct fonts that raises a legal issue, from a font licensing viewpoint.

sjs382 · on Feb 16, 2011

Do these embedded documents pass link popularity?