Hacker News new | past | comments | ask | show | jobs | submit login

> If you find writing in a given dialect of Markdown or LaTeX or Org-mode is easiest, do that.

I find Org-mode the easiest but like I said in my comment, the conversion quality is not great. Pandoc breaks a lot of stuff in Org-mode in edge cases. One example I shared in my comment was Pandoc breaking internal links.

So by selecting something I find the easiest I have burned many hours of troubleshooting figuring out why the output does not look right.

That's why I want to draw upon the wisdom of the community here to find out which input format works best and by best I mean flawlessly. No edge case issues. No rendering flaws. If I get the specific recommendations, I'll try them out for sometime and then commit myself to it instead of burning more time trialling all of the different input formats.




Unfortunately, the perfect is very much the enemy of the good here. Aside from HTML, I'm afraid that PDF and EPUB are very much driven by purpose-built tools designed to show interactively what it will look like as output. This means that they've both delved into a depth of subtle semantic differences that makes flawless output an extremely difficult task. Of course, practically, pandoc can resolve the vast majority of what people actually use, but everything will still be hit by edge cases from time to time, leading to subtle issues or incompatibilities between EPUB, PDF, and HTML. Each edge case can, of course, be solved in isolation, so finding something that's solved the ones you are encountering already is the ideal, providing a seamless experience for your work. Sadly, each of those is built to solve someone else's specific work, and so sometimes we just have to accept that we either need to compromise on something, we need to paper over the gaps by combining the right tools, or we have to write something ourselves. Fortunately, it isn't the 80s anymore, so many of the tools we have are the "right" ones, and pandoc is very good at combining them.

Again, I find that Markdown (with inline LaTeX or HTML) seems to be Pandoc's preferred starting point, and that the HTML backends are quite useful (particularly when not needing full LaTeX), so perhaps there's some luck to be had there, since HTML may preserve Org's linking and such a bit better, though I don't use Org myself so can't attest to it. And if there's really a problem, then perhaps Pandoc needs some help sorting Org-mode out!


Riffing on crafting pipelines by combining tools...

Org mode can also export html and markdown, so that's three potential pandoc inputs, with potentially different properties. All of which might be massaged before input. And in extremity, an org-mode parser permits emitting customized input. Then pandoc's parsing and filters permit altering the pandoc ast in flight. And the ast isn't hard (assuming comfort with ASTs), so if some other tool has templates and output one likes, one might skip the pandoc backend and emit it oneself from pandoc ast json. Rather than hoping to persuade that other tool to both accept and generate what's needed.

So for instance, last year I had a project written in a project-specific markdown dialect, kludged to pandoc-flavored markdown, parsed with `pandoc -t json`, and html emitted custom from the pandoc ast. With embedded directives from dialect to emitter. And html templates copied from non-pandoc tools. In a language with nice pattern matching (julia's Match), the emitter was a short page of code.

"Avoid reinventing wheels, but sometimes it's easier to assemble a satisficing custom vehicle, than to find and adapt a previously-built one."


Great comment! Thanks for engaging in this discussion and offering some good perspective about my Pandoc issues. Really appreciate it!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: