Thanks! The pitch is that HTML to PDF conversion tools, as a rule, are not very ...

highace · on Oct 12, 2012

That's interesting. So you've built your own HTML to PDF converter? Can you provide an example (a screenshot, maybe) where your version excels against an existing solution?

zrail · on Oct 12, 2012

It's a small web service that wraps around a Java library named Flying Saucer, so there isn't anything to look at really.

Flying Saucer excels against the alternatives I looked at in a few ways. First, Pandoc's built in PDF writer uses a LaTeX intermediary which doesn't support anything that web writers have come to expect. Second, the other tools were webkit based which variously didn't support the page media spec, didn't support embedding fonts, or both. Others were custom one off of desktop tools that wouldn't work how I need for Docverter.

sciboy · on Oct 12, 2012

Just so you are aware, flying saucer while nice when you first use it has tonnes of bugs and isn't really being developed these days. You'll find yourself more and more diving into the code because the output is substandard. We used it for years and have now moved away because we couldn't stand customising it for every little edge case more, plus it doesn't support html5/CSS3 which is essential nowadays. Take a look at the codebase - you won't want to be adding to that! Additionally it expects documents to be completely in memory, which means it will take down your Java server sometimes.

zrail · on Oct 12, 2012

Out of curiosity, what did you move to?

sciboy · on Oct 13, 2012

We've been using phantomjs, but it has it's issues too. I'm not sure there is a good solution anywhere in the open source world unfortunately - and tools like prince are expensive.

zrail · on Oct 12, 2012

Good to know. Thanks.

vamega · on Oct 12, 2012

What features do web writers expect that LaTeX doesn't have? I think LaTeX supports fonts (well XeLateX does).

zrail · on Oct 12, 2012

In particular, styling and layout with CSS. People seem to want seamless HTML to PDF conversion, and providing that without a LaTeX intermediary seems to be the best way to go.