I don't think it's bad faith at all. They're up front about what tools the system is built on. It's more about whether it makes the most sense for you to build and host something like this yourself, or pay for what you use on a service that someone else runs for you. There are pros and cons to each, of course, but I think there are a lot of situations where the latter is going to be better.
One example being: if I was building a new product and wanted to add some reporting/exporting features, I would much rather use a service like this. Document conversion is just infrastructure for those features, and so any time I spend setting up my own system for that is potentially wasted time until I can prove that the features are a success.
If the author is conversant in Haskell (or willing to become so), an interesting way to add value would be to support and document the various output formats, and add new output formats.
Pandoc's an awesome conversion tool, but because of its many supported output formats, some are more reliable than others. For example, it can actually output an HTML/JS-based slide deck - using any of four different JS slideshow libraries - but in my experience only one of them is actually usable, and it's not clear how to customise/style the output.
Of course fixing that as a developer is a simple matter of reading docs / code, but if this product is aiming to be "Pandoc for non-developers", that would be an interesting angle.
You're right, I went through and renamed some sections. I did that to reduce confusion, because there are a bunch of parts where the pandoc docs talk about things that don't make sense when you're POSTing instead of using command line options. I did give attribution at the bottom in the Authors section.
You'd generally pay for a service like this if you're running on Heroku or another PaaS and don't want to have to deal with getting Pandoc and various other supporting tools up and running.
Whilst I agree that people should be open about the software they run, let us remember that selling access to GPL software running on your machine is perfectly legal.
Also, for what it's worth the author was very open and explicit about the provenance. In the footer:
"This is a copy of the Pandoc README file, modified to suit Docverter's manifest format."
I'm confused though. In another comment [1] zrail says this (the HTML-to-PDF in particular) is built on a Java library. Is Flying Saucer based on Pandoc? Do you use one sometimes and the other other times?
Docverter is a collection of a few pieces of software that get used at various times. For example, if you do markdown to docx you'll just be using Pandoc. If you convert to PDF you'll be going through flying saucer. If you go markdown to PDF you'll go through both. MOBI conversions go through Pandoc to get an ePub and then through Calibre to get the mobi.
The point is that you don't have to worry about those pieces, though, since Docverter abstracts over them with a simple API.
This is super cool. One thing that would be useful is an "examples" page. That is, have some sample .txt or .html files, and show us what the output .doc, .docx, .pdf, .epub, etc files look like. Just a list of static files, really.
You could also do something more fully-featured, like a sandbox where you can upload your own files. Perhaps it could be an "evaluation plan" which has a maximum of 10 conversions per month. (Then again, $5 isn't really that much to pay to evaluate a service.) Or maybe unlimited conversions with the evaluation plan, but the output files have a watermark?
I had no idea this was based on pandoc until I read the HN comments. So, cool!
Thank you for the kind words. There are a few examples on the API page in the Advanced section but you're right, they should be featured more prominently. Also, the free dev plan will give you full access but will indeed insert watermarks.
I'm impressed someone has had the balls to launch a service without a free-tier. Note that there is a developer's access plan for free though, so no moaning!
This looks great (and is something I'll very likely use on a project I'm working on now!)
My quick "dumb" question -- what's the pitch for using this vs. what I would call more traditional conversion tools? My project will need some HTML -> PDF goodness and I was planning on researching and running some sort of local / server-side package (which I presume exists, though I haven't researched them yet).
Either way, congrats on the launch - this makes a lot of sense and sounds like a great utility.
Thanks! The pitch is that HTML to PDF conversion tools, as a rule, are not very good. Docverter is actually on it's third iteration of that particular conversion because the first two didn't provide even close to satisfactory results.
That's interesting. So you've built your own HTML to PDF converter? Can you provide an example (a screenshot, maybe) where your version excels against an existing solution?
It's a small web service that wraps around a Java library named Flying Saucer, so there isn't anything to look at really.
Flying Saucer excels against the alternatives I looked at in a few ways. First, Pandoc's built in PDF writer uses a LaTeX intermediary which doesn't support anything that web writers have come to expect. Second, the other tools were webkit based which variously didn't support the page media spec, didn't support embedding fonts, or both. Others were custom one off of desktop tools that wouldn't work how I need for Docverter.
Just so you are aware, flying saucer while nice when you first use it has tonnes of bugs and isn't really being developed these days. You'll find yourself more and more diving into the code because the output is substandard. We used it for years and have now moved away because we couldn't stand customising it for every little edge case more, plus it doesn't support html5/CSS3 which is essential nowadays. Take a look at the codebase - you won't want to be adding to that! Additionally it expects documents to be completely in memory, which means it will take down your Java server sometimes.
We've been using phantomjs, but it has it's issues too. I'm not sure there is a good solution anywhere in the open source world unfortunately - and tools like prince are expensive.
In particular, styling and layout with CSS. People seem to want seamless HTML to PDF conversion, and providing that without a LaTeX intermediary seems to be the best way to go.
I posted last week about Docverter, my plain text to rich text conversion tool. It's actually ready for people to start using now. I'll be here all day to answer questions.
You're in for a world of pain when you start parsing docx and pptx. On the bright side, if you can figure out a good solution, you'll likely have a solid business model. I would imagine that there would likely be significant demand for converting docx and pptx files into html or markdown, as a service. If you do come up with a nice, well-documented API for all of this, I'd certainly recommend your service. If you come up with an outstanding docx parser, then I'd use your service myself (I am using my own somewhat primitive solution for a current project involving the conversion of docx files).
Here's a few projects to look at, if you haven't already:
You might need to be careful that you don't end up liable for any copyright infringement. I would speak to a lawyer before pulling content from a URL and transferring it to an unrelated destination.
Maybe T&C can protect you in this scenario but maybe not.
The use case I was thinking of was using it on my own website though, for things like printing out a nicely formatted invoice or a printable mockup of a webpage.
Recently I designed some flexible forms meant for printing, but I used php/html/css to generate it. I discovered that it's really hard to get a good quality print out of a webpage. If you use screenshots you get poor resolutions, and direct to print/pdf conversion tools didn't render the CSS all that well.
Don't see how T&C couldn't cover such scenarios in which the user has explicit permission to generate a copy of a webpage.
Can this service handle more complex structures like converting a table in Textile or Markdown into a Table in docx? Can it handle word styling?
I'm building software where the output must be in docx, wondering how far I can go in not having to deal with word automation to get the output I want into a Word Doc.
It can definitely handle word styling if you provide it with a reference docx to copy the styles from. I haven't tried tables. If you'd like, feel free to create a dev account and give it a spin.
This is likely a service I'd use once in a while. Say I want to convert an html to mobi once per month, the listed pricings do not fit my use case. That means one conversion for 5 bucks.
I know it isn't easy to set up the pricings but would there be any "pay per use" for people like me?
If you're running your app on Heroku or another free PaaS, it's a pain to get pandoc running and stay within the slug size limits. Additionally, if you want to convert HTML to PDF and have reasonable results (i.e. CSS support of any sort) you need to run a secondary conversion process which is also not trivial to set up.
There's a dev plan link at the bottom of the pricing page, specifically for evaluating before you purchase. It watermarks your documents but other than that it's the real deal.
In fact, my observation is that the API documentation was merely copy-pasted from the original. Example:
Pandoc docs[1]: http://dl.dropbox.com/u/144454/hn/from.png
Docverter API[2]: http://dl.dropbox.com/u/144454/hn/to.png
[1] http://johnmacfarlane.net/pandoc/README.html#header-identifi...
[2] http://www.docverter.com/api.html#toc_425
However the author went the extra mile to rename sections thus making it sound like the Markdown extensions are in fact Docverter's.
Sorry to be so negative, but this almost seems like acting in bad faith and selling a GPL-licensed software as service under a new name.