more jgresula's comments

jgresula · on March 31, 2010

My current plan is to charge for conversion tokens but I'm not decided how much yet.

juliancox · on April 1, 2010

Check out: http://www.htm2pdf.co.uk/htm2pdf-web-service.aspx Their pricing indicates 40,000 conversions for $90. I'd pay that.

Tomazaz · on April 14, 2010

Similar, also support pdf by e-mail http://www.web2pdfconvert.com

jgresula · on March 31, 2010

No special reason - I just don't know PHP. But it is on the todo list.

jgresula · on March 31, 2010

Sorry, don't know why. Your site does not validate but it could be problem on my side as well.

jgresula · on March 31, 2010

That's a non-trivial task. There are no such objects like tables, styles, lists or paragraphs in PDF so you would need to reconstruct this kind of information. Also, text and vector graphics is positioned absolutely. Tagged PDFs contains some meta-information about the document structure which could help but still it is a lot of work.

The fundamental problem is that PDF stores the document presentation while html defines the document and the presentation is created by the browser. And obviously, to restore a document definition from its presentation is hard as lot of information is missing.

dpapathanasiou · on March 31, 2010

That's a non-trivial task.

Yes, that's true.

I only bring it up b/c if your goal is to turn pdfcrowd into an app that people would pay money for (and I would be one of them), solving that problem would go a long way towards achieving it.

thepsi · on March 31, 2010

Solving it perfect is non-trivial (I've known entire PhDs to be spent working on a small subset of the problem). There are a number of products/projects that solve it to some extent (techniques include absolute positioning & making sweeping assumptions about what constitutes a paragraph) - would this be enough for you to consider paying for, given that their assumptions/workarounds might produce HTML files that aren't quite to your 'taste'?

latortuga · on March 31, 2010

There already many apps and pieces of software that charge for the feature he already has so I don't see why it is a requirement for him to monetize. It definitely would be an easy feature to charge for but I think what he has already has potential.

brandnewlow · on March 31, 2010

Total noob question, couldn't you programmatically capture a browsershot and then convert that into a PDF?

HTML -> png seems to have been figured out. Is .png -> pdf that hard to do?

vibhavs · on March 31, 2010

No, .png to .pdf is not difficult.

I believe dpapathanasiou's suggestion is not to blindly convert a pdf into html file with one giant image file of the pdf.

Instead, he wants to create an html document that maintains the same content and layout from the pdf.

brandnewlow · on March 31, 2010

D'Oh! Got myself mixed up there a bit.

dmv · on March 31, 2010

NitroPDF does a remarkably good job translating PDF to Doc and RTF. I think the application (windows :() is better/has more output options, but they have a free web service: http://www.pdftoword.com/

jgresula · on March 15, 2010

1) Since November 2009, launched this March.

2) Never applied.

3) No, self-funded.

4) http://pdfcrowd.com - html to pdf online API

jgresula · on March 14, 2010

Have a look at http://pdfcrowd.com - it is a service providing an html to pdf online API and it has bindings for python. It is in private beta now, so if you are interested please just contact me and I will send you an invitation key.

simplegeek · on March 14, 2010

Hmm, looks good. Cannot find your email in your profile, can you kindly let me know your email? Or can you email me at rolf.oltmans@gmail.com. Thank you.