Hacker News new | past | comments | ask | show | jobs | submit | more jgresula's comments login

My current plan is to charge for conversion tokens but I'm not decided how much yet.


Check out: http://www.htm2pdf.co.uk/htm2pdf-web-service.aspx Their pricing indicates 40,000 conversions for $90. I'd pay that.


Similar, also support pdf by e-mail http://www.web2pdfconvert.com


No special reason - I just don't know PHP. But it is on the todo list.


Sorry, don't know why. Your site does not validate but it could be problem on my side as well.


That's a non-trivial task. There are no such objects like tables, styles, lists or paragraphs in PDF so you would need to reconstruct this kind of information. Also, text and vector graphics is positioned absolutely. Tagged PDFs contains some meta-information about the document structure which could help but still it is a lot of work.

The fundamental problem is that PDF stores the document presentation while html defines the document and the presentation is created by the browser. And obviously, to restore a document definition from its presentation is hard as lot of information is missing.


That's a non-trivial task.

Yes, that's true.

I only bring it up b/c if your goal is to turn pdfcrowd into an app that people would pay money for (and I would be one of them), solving that problem would go a long way towards achieving it.


Solving it perfect is non-trivial (I've known entire PhDs to be spent working on a small subset of the problem). There are a number of products/projects that solve it to some extent (techniques include absolute positioning & making sweeping assumptions about what constitutes a paragraph) - would this be enough for you to consider paying for, given that their assumptions/workarounds might produce HTML files that aren't quite to your 'taste'?


There already many apps and pieces of software that charge for the feature he already has so I don't see why it is a requirement for him to monetize. It definitely would be an easy feature to charge for but I think what he has already has potential.


Total noob question, couldn't you programmatically capture a browsershot and then convert that into a PDF?

HTML -> png seems to have been figured out. Is .png -> pdf that hard to do?


No, .png to .pdf is not difficult.

I believe dpapathanasiou's suggestion is not to blindly convert a pdf into html file with one giant image file of the pdf.

Instead, he wants to create an html document that maintains the same content and layout from the pdf.


D'Oh! Got myself mixed up there a bit.


NitroPDF does a remarkably good job translating PDF to Doc and RTF. I think the application (windows :() is better/has more output options, but they have a free web service: http://www.pdftoword.com/


1) Since November 2009, launched this March.

2) Never applied.

3) No, self-funded.

4) http://pdfcrowd.com - html to pdf online API


Have a look at http://pdfcrowd.com - it is a service providing an html to pdf online API and it has bindings for python. It is in private beta now, so if you are interested please just contact me and I will send you an invitation key.


Hmm, looks good. Cannot find your email in your profile, can you kindly let me know your email? Or can you email me at rolf.oltmans@gmail.com. Thank you.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: