Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Any Need for Office to PDF Conversion API?
16 points by jcnnghm on July 11, 2010 | hide | past | favorite | 29 comments
For a project I was working on, I needed the ability to convert Microsoft Office files (Word, Excel) into PDF files so that I could work with them more easily. I created a REST API that allows this conversion very easily; POST the Office file, and once the conversion is complete, receive a POST with the converted PDF. I was thinking that this service could be useful for other people's projects. Before I go through the effort of creating a website and documentation, does anyone else have a need for this sort of service?



Before you build it, you might be interested to know that Scribd (and Docstoc, and others) have long had API's that offer this as a free service. We have seen pretty good usage at Scribd, but never enough to think that we could charge for it.

On the other hand, API's for document sharing sites are more complex and offer a lot of other functionality - this may scare off users who just want the transformation. I could see a market opportunity for a Twilio style API that was very targeted at this functionality. I still don't think you'd make a great deal of money charging for it, but I could see it getting some use.


I had no idea that Scribd could do this. I think I'd rather just use the Scribd API than do it myself, and I can probably use some of the other functionality as well. Thanks for the heads up.


Have a look at JODConverter - http://www.artofsolving.com/opensource/jodconverter

I have used it in the past from my Java applications in the same way you intend to. Here is a list of supported formats of conversion: http://www.artofsolving.com/opensource/jodconverter/guide/su...


JODConverter is definitely a reasonable solution for most applications. You can also write a Python script that leverages OpenOffice's Python UNO API to do the same thing - a basic script that turns Office files into PDF's is only ~20 lines of Python.

For really serious applications, though, you will eventually run in the limits of OpenOffice's support for MS Office formats. If you are very picky (or have a very picky client) about the quality fo the conversions, you'll end up having to use Microsoft Office to do the conversions. This is a lot less pleasant to set up, so I wouldn't do it unless you have to.


I was looking for one and didn't find a simple enough one. So ended up installing unoconv on my server. Does the job.


I've seen this done (and used on a few occasions) as a standalone service for end users. Usually this is ad supported. I can think of a few applications I might use this for, such as accepting school assignments without having to trust all of the students to submit proper file types. Outside of advertising, how might you monetize such a service? Or would it be a public service?


I was thinking a Twilio style model. Sign up, account gets charged with a few dollars for testing. Then there would be a small fee per file, I was thinking $0.05 to $0.10 per file, or a penny a page.

Haven't really gotten that far with it.

Edit: I should have also mentioned that I don't think this could be ad supported because the end user will likely never see anything but the output file, as it's an API and not a user facing service.


It depends on how much it costs. I needed this for awhile. My startup sends faxes from the internet (http://www.hellofax.com). We need to convert files from 40 some different file types to PDF before we send it to the fax server. There were other converters out there, but they were limited in the number of file types that they convert. So, we are now paying $30+ dollars a month for a Windows server at Rackspace to do it. So, if you can convert a ton of different file types, we would have definitely been a customer.


I used to work on Google's applicant tracking system. We used some software that did that to convert all the resumes. It didn't work very well for formatting.


Interesting. I was able to solve the quality issues that plague many of the other solutions for this problem. Do they still use something like that?


I don't know, I worked on that in 2005 and have since left Google. But I'd imagine a lot of large companies do something similar with resumes.


when i was doing the application late fall, early winter, it was acrobat pdf based, which sadly is broken on macs with a case sensitive file system


It looks like they've built an online application system since I was on that project. 6 years ago, you just emailed your resume to an email address that mapped to a job requisition, and we tried to process whatever attachments came in that way.


I know I am dealing with a client that wants to turn web pages into good looking PDFs ready to print and ready for review by a boss.


The best way by far is wkhtmltopdf, because it uses webkit to render the page. Most other open source projects use a toy HTML renderer, which is not going to work for in-the-wild webpages.


Thanks for that link. Right now I'm doing something similar using a modified version of webkit2png.py that outputs into PDF instead. I'll have to check this out.


A shameless plug for my project, but hopefully you might find it useful.

I run http://pdfcrowd.com which is an online service providing html to pdf API. It lets you convert a web page to pdf quite easily - have look at examples at http://pdfcrowd.com/doc/api/


If you can afford it, take a look at Prince XML.


coldfusion has that pretty much built in...


This is actually the only reason we're still running a ColdFusion server. About once every 6 months I look at the options out there, but until the FSF gets the momentum going behind GNU PDF (http://www.fsf.org/campaigns/priority-projects/), I think CFM'll still rule the roost.

We also rely on pieces like OOONinja's ODF converter to cope with .docx files (http://katana.oooninja.com/w/odf-converter-integrator).


I did something similar to this some time ago. There is software that will convert from html to postscript (html2ps), and then there is the ps2pdf project which will convert the postscript to a pdf file.

I had success with this approach, though I had to tune my html and a number of ghostscript settings to get it to produce exactly the output that I wanted.


There has to be some need for this in the legal industry. Good luck!


We take PDFs and convert them to XML, CSV or put them in a DB and provide an access API. Can anyone think of a use case for this type of system?


this idea as a web app is probably MORE viable if you could add in some other conversion services... but it is all kind of muddy waters


I don't get it. Why not just 'print' to pdf with Adobe's driver?


Because you can't do that programmatically from a system that isn't running windows, like a linux web server processing files for display.


Sure you can! Webkit html to pdf: http://code.google.com/p/wkhtmltopdf/


Since when does "HTML" include "Office?"


It's been done. I use http://www.fastpdf.com/

If you monetize it, make it easy to submit jobs, you have a business model.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: