Hacker News new | past | comments | ask | show | jobs | submit login

Personally, whatever helps with the specific writing part of it all the most is what's best. If you find writing in a given dialect of Markdown or LaTeX or Org-mode is easiest, do that. For me, that's Markdown with embedded LaTeX, for others it's Org-mode, or RST, and so on.

Pandoc handles these fairly seamlessly, and with many options for PDF engines, though I'd say it has a preference for LaTeX and HTML in the backend and Markdown in the frontend, based on my experiences with the edge cases (sometimes entirely solvable with a little Haskell or Lua).

Since LaTeX is the default for PDFs, it pays to keep that in mind and help LaTeX help you (you can use it inline with Markdown or included as preamble in configuration), but sometimes I've just had better luck converting via HTML to PDF ("-t html output.pdf" or directly chaining on from output.html) for what I'm writing in the moment, though other times I'm not stressing LaTeX as much and can just go straight from Markdown to PDF (for example, just writing up something with inline maths). I prefer to avoid LaTeX or HTML's escaped character encoding and often need far more than a single Latin font can provide, so I've ended up dealing with LaTeX's limitations here (even in lualatex and xelatex) more than what I'd suspect is typical. Meanwhile, the standard HTML to PDF backend uses Qt, and I've found it works for everything else I've needed when LaTeX isn't the right backend (and it does come up). On one occasion, I did have to switch that to weasyprint, and that was everything sorted. Alternative backends is an unsung power that few have, while pandoc not only has many built-in (or it is at least internally aware of) but will also integrate with any CLI needed.

Output to all three with HTML, EPUB, and PDF can just need a bit of fiddling before it comes out right, depending on how much you're willing to mess with specific metadata for each versus accepting the limits of what Pandoc can handle universally in its AST. Invariably, some compromise is required, but the core semantics of Markdown (including extensions) almost always translate without an issue. The dialect problem of Markdown is really just in the confluence of said semantics with things that have not been separately included, such as the lack of an actual header in Markdown (Pandoc here allows YAML for some, or you just fall back to HTML).

So, tldr; there's no "best" input format, except the one that you find most comfortable to just write the book in, but I find Pandoc is usually best approached from Markdown with the LaTeX or HTML backends. It's powerful and oh so very handy, but it's not going to do all the thinking for you, just a lot of the grunt work, same as any other tool. When in doubt, the user manual is quite readable, and I've found it answered almost every question I had. When it doesn't, other people do, and when they don't, it means I'm either going about it the wrong way or I get to solve an actual problem (but usually the former). But, as always, the most important thing is actually writing it, distribution comes later, so focus your efforts on that and the tools you need to do that effectively.




> If you find writing in a given dialect of Markdown or LaTeX or Org-mode is easiest, do that.

I find Org-mode the easiest but like I said in my comment, the conversion quality is not great. Pandoc breaks a lot of stuff in Org-mode in edge cases. One example I shared in my comment was Pandoc breaking internal links.

So by selecting something I find the easiest I have burned many hours of troubleshooting figuring out why the output does not look right.

That's why I want to draw upon the wisdom of the community here to find out which input format works best and by best I mean flawlessly. No edge case issues. No rendering flaws. If I get the specific recommendations, I'll try them out for sometime and then commit myself to it instead of burning more time trialling all of the different input formats.


Unfortunately, the perfect is very much the enemy of the good here. Aside from HTML, I'm afraid that PDF and EPUB are very much driven by purpose-built tools designed to show interactively what it will look like as output. This means that they've both delved into a depth of subtle semantic differences that makes flawless output an extremely difficult task. Of course, practically, pandoc can resolve the vast majority of what people actually use, but everything will still be hit by edge cases from time to time, leading to subtle issues or incompatibilities between EPUB, PDF, and HTML. Each edge case can, of course, be solved in isolation, so finding something that's solved the ones you are encountering already is the ideal, providing a seamless experience for your work. Sadly, each of those is built to solve someone else's specific work, and so sometimes we just have to accept that we either need to compromise on something, we need to paper over the gaps by combining the right tools, or we have to write something ourselves. Fortunately, it isn't the 80s anymore, so many of the tools we have are the "right" ones, and pandoc is very good at combining them.

Again, I find that Markdown (with inline LaTeX or HTML) seems to be Pandoc's preferred starting point, and that the HTML backends are quite useful (particularly when not needing full LaTeX), so perhaps there's some luck to be had there, since HTML may preserve Org's linking and such a bit better, though I don't use Org myself so can't attest to it. And if there's really a problem, then perhaps Pandoc needs some help sorting Org-mode out!


Riffing on crafting pipelines by combining tools...

Org mode can also export html and markdown, so that's three potential pandoc inputs, with potentially different properties. All of which might be massaged before input. And in extremity, an org-mode parser permits emitting customized input. Then pandoc's parsing and filters permit altering the pandoc ast in flight. And the ast isn't hard (assuming comfort with ASTs), so if some other tool has templates and output one likes, one might skip the pandoc backend and emit it oneself from pandoc ast json. Rather than hoping to persuade that other tool to both accept and generate what's needed.

So for instance, last year I had a project written in a project-specific markdown dialect, kludged to pandoc-flavored markdown, parsed with `pandoc -t json`, and html emitted custom from the pandoc ast. With embedded directives from dialect to emitter. And html templates copied from non-pandoc tools. In a language with nice pattern matching (julia's Match), the emitter was a short page of code.

"Avoid reinventing wheels, but sometimes it's easier to assemble a satisficing custom vehicle, than to find and adapt a previously-built one."


Great comment! Thanks for engaging in this discussion and offering some good perspective about my Pandoc issues. Really appreciate it!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: