Hacker News new | past | comments | ask | show | jobs | submit login

Thanks! The pitch is that HTML to PDF conversion tools, as a rule, are not very good. Docverter is actually on it's third iteration of that particular conversion because the first two didn't provide even close to satisfactory results.



That's interesting. So you've built your own HTML to PDF converter? Can you provide an example (a screenshot, maybe) where your version excels against an existing solution?


It's a small web service that wraps around a Java library named Flying Saucer, so there isn't anything to look at really.

Flying Saucer excels against the alternatives I looked at in a few ways. First, Pandoc's built in PDF writer uses a LaTeX intermediary which doesn't support anything that web writers have come to expect. Second, the other tools were webkit based which variously didn't support the page media spec, didn't support embedding fonts, or both. Others were custom one off of desktop tools that wouldn't work how I need for Docverter.


Just so you are aware, flying saucer while nice when you first use it has tonnes of bugs and isn't really being developed these days. You'll find yourself more and more diving into the code because the output is substandard. We used it for years and have now moved away because we couldn't stand customising it for every little edge case more, plus it doesn't support html5/CSS3 which is essential nowadays. Take a look at the codebase - you won't want to be adding to that! Additionally it expects documents to be completely in memory, which means it will take down your Java server sometimes.


Out of curiosity, what did you move to?


We've been using phantomjs, but it has it's issues too. I'm not sure there is a good solution anywhere in the open source world unfortunately - and tools like prince are expensive.


Good to know. Thanks.


What features do web writers expect that LaTeX doesn't have? I think LaTeX supports fonts (well XeLateX does).


In particular, styling and layout with CSS. People seem to want seamless HTML to PDF conversion, and providing that without a LaTeX intermediary seems to be the best way to go.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: