I can't express enough my gratitude on a daily basis for what pandoc enables me to do. I made a simple Emacs script that I use to output files, and I use it constantly for Latex PDFs, HTML output, RevealJS slides, and odt/docx/etc. All with bibliographies fron Zotero in zillions of formats. As a professor and journalist, I need to use a wide range of output formats, but as a human being I like to work in clean, simple text files that will never be obsolete. Pandoc, way more than any tool, gives me the freedom to work in any writing environment I like and keep that distinct from whatever weird formatting preferences a journal, magazine, or publisher might have. I've written two books with Markdown and a huge variety of articles. I am so thankful for the care with which it has been built and maintained. Thank you.
Yeah, filters are great. Writing filters is easy: Pandoc basically converts the input document into a universal AST (json), and a filter is just any program that takes this json as an input and outputs a modified json AST.
I wrote a filter that automatically converts URL citstions in markdown to "real" citations in any style you want - very useful for writing papers without fighting with bibtex and managing bibliographies manually: https://github.com/phiresky/pandoc-url2cite
I'll be starting a PhD soon and would love to use a pandoc-based workflow (with MD or another format) for the gross of my writing.
How did you all structure the commenting on your writing?
I find converting to odt/doc before sending, managing all the exported versions with comments etc. becomes quite tedious. But I'm a bit reluctant to force my supervisors to use eg. git+criticmarkup[1]. I would love to hear you experiences!
- In my case, my supervisors mostly had handwritten notes which rendered that point a bit moot. However, when I send the almost-complete draft to a professional copy editor, it was indeed a pain to add the comments. Either handwritten+scanned, as acrobat comments, or word comments, they had to be manually input into the markdown file.
- Everything else worked relatively better. It was a bit tedious to type loooong pandoc commands "pandoc --filter=... etc etc" so I recently coded pandocmk [1] to make my life easier. It's not super well documented (but it's a quite short script, so readable), but the idea is that you type the command line options as metadata at the top.
I've used that to process tables, essentially using markdown as a commented CSV format. The only nuisance is that a table can't yet have attributes — https://github.com/jgm/pandoc/issues/6317 — the workaround being a pre-filter to copy them from a surrounding div.
I've also toyed with using it to process code blocks, as a dead-simple literate programming tool.
Not sure if this is a joke but if you read the linked docs [0] you'll see that the concept of filters is that you the user can write programs (essentially plugin-style) that modify the AST Pandoc generates in order to perform the conversion. But this explanation is ultimately worse than the actual doc page, so I'd recommend just reading that.
- CommonMark, the first formalized Markdown standard, and now the de-facto Markdown standard. https://commonmark.org/ (He's the first listed member of the team.)
I feel like John is probably the single largest contributor to what Markdown is today, other than perhaps the creator of Markdown. Thank you for your work!
The creator of Markdown hasn't touched it in over a decade and yet decided to throw a temper tantrum because CommonMark dared to initially call itself Standard Markdown.
As a software engineer working in a data interoperability role (not that I would claim authority, but pragmatic experience):
I'm not sure of the specifics but personally I prefer formats that don't evolve over time. So not changing a spec for over a decade should not be considered pathological but actually commendable, if the nature of spec is complete enough for it's purpose.
I know vanilla Markdown is too limited for some use cases. But that is no reason to "overwrite" it.
The problem was there was no _specification_. It was a 'how to use' summary, And each implementation could be (and was) different in subtle edge cases.
The point of CommonMark was to define the specification and stick to it.
I agree with GP, I thought it rather sucky that he objected to using the name.
Various markdowns have extension mechanisms, they always have. That's not what the GP was talking about.
The general idea behind your points are sound and correct.
However, the problem is you seem to be generally ignorant of widely known points of knowledge about Markdown.
> not changing a spec for over a decade should not be considered pathological but actually commendable, if the nature of spec is complete enough for it's purpose
100% agree in theory, but Markdown's creator never wrote any spec. when creating it. Initiatives like CommonMark are efforts to specify and unspecified language, not to evolve nor replace any existing spec.
> So not changing a spec for over a decade should not be considered pathological but actually commendable, if the nature of spec is complete enough for it's purpose.
The "nature of the spec is complete enough for its purpose" is the part that's not met, though (at least in many people's minds). The Markdown "spec" (either the description written by Gruber or the `Markdown.pl` file) has ambiguities and inconsistent behavior. My understanding is that there were many requests from the community for this to be clarified, but it never was. So I think a decade of inattention is not commendable in this instance. The CommonMark landing page[0] has some more about this issue.
I agree with your characterization. (I didn't always -- I actually advocated at the time for CommonMark to respect Gruber's wishes and create their own branding [1].)
Sure, Gruber didn't allow CommonMark to use the Markdown name, but I feel like that's not a super big deal compared to what he did do. The Markdown ecosystem wouldn't exist if Markdown hadn't been created in the first place! I'm not confident someone would have made something like Markdown if Markdown was never created: AsciiDoc and reStructuredText came out before Markdown but have not been as successful.
Gruber's original Markdown spec lacked formality -- and that's where CommonMark eventually filled the gaps -- but I think that Markdown's focus on user experience over technicality was the key to its success over competing formats and WYSIWYG editors (the real competition). By the time CommonMark came around, Markdown had already seen viral adoption; three of CommonMark's creators are from large companies that were already prominently using Markdown.
tl;dr I think the original Markdown spec and CommonMark are both significant contributions in their own right!
I had an interesting conversation with John MacFarlane, the maintainer and author of Pandoc (lovely human being and excellent maintainer), and the subject of day jobs came up. He's a professor of logical philosophy at UC Berkeley which I thought was fascinating. It certainly makes sense given the number of document formats and such that academia deals with.
What is it with amazing professors and musical prowess? My Cryptology professor is also a fiddle player! Ivan Damgård, of the Merkele-Damgård construction.
That observation doesn't seem right. There are a lot of people in STEM. Most do not come from upper class parents. Many of the children of upper class parents go into non-STEM fields, including law. (Some go into music, which is not STEM.)
My decidedly working class parents insisted we learn to play piano. Most of my relatives had a piano in the house. I think it was a holdover of the days when home entertainment was self-made.
A friend lived in both the Los Angeles and New Orleans areas. He compared the two as: in LA, the parties of rich people have live music. In New Orleans, the parties of poor people have live music.
And Damgård, mentioned earlier, was born in 1957 Denmark, and plays Danish and Nordic folk music. Postwar Denmark was poor. Perhaps this interview (in Danish) explains why he started? https://www.youtube.com/watch?v=AUF_EkN4Z-g
So, 1) is there a significantly high proportion of people in STEM who are into music than non-STEM? (and not simply some sort of observational bias), and 2) is the major contributing factor to the high proportion because the parents of those people were upper class? (and not some other factor like STEM fields paying enough so people have free time for hobbies.)
I would add lots of kids in lower income homes are exposed to music being played by family members, peers, school programs, or church groups (examples). It's true that these kids might not be playing Mozart but there is nothing wrong with bluegrass, gospel, or whatever, to instill a love of playing music.
Maybe listen to "Juke Box Hero" or "Coat Of Many Colors" for inspiration on how people from modest backgrounds can have the same fulfilling experiences as wealthy people. (sorry - personal soapbox)
Actually, Music is STEM, in the true and classical sense - it's only our perverted modern view that has severed music from its moorings in mathematics.
It was even part of the quadrivium of medieval education: Arithmetic, Geometry, Astronomy, and MUSIC! A classical education not only taught these subjects, but how they were all inextricably interrelated (or intertwingled, as Ted Nelson famously says...)
STEM is not a classical term. Don't go thinking that because we don't follow ancient Greek philosophy or medieval education philosophy that are somehow perverted.
FWIW, I strongly dislike "STEM" as a term because it makes no sense to me in an educational or philosophical sense. I see it more as an attempt to lower the cost of hiring engineers and scientists by increasing the supply. For example, compare the funding going into getting more programmers and EEs, vs. marine biologists and paleontologists, even though all of them are STEM.
To clarify "no sense to me", I despise Pirsig's "Zen and the Art of Motorcycle Maintenance" because of its insistence on a clean division between romantic and classical views. I view "STEM"'s treatment of the rest of the liberal arts as being similarly incorrect in its dichotomous classification. Eg, mathematics is important for the humanities too.
But it's clear what _def is talking about by "STEM", and there's no need to suggest we or modern culture are following along with a perversion because the conversation isn't aligned with your personal views.
They also provide an excellent break from the kind of thinking required in things like programming IME. I can be exhausted from a day of coding and happily sit down and practice with the piano in a way I couldn't with other intellectual topics like maths.
Pandoc is great at bridging the gap between science-oriented data control needs, and management-oriented reporting needs.
I was on a modeling project that used scripts to generate hundreds of input parameters, embed them in models, run the models, and produce reports. The inputs and outputs shifted a lot over the course of the project, as we came to understand the domain and implications of the work better. At every update, the changes had to be transferred to a Microsoft Word document that went to the project sponsors.
Pandoc made this easy -- we just added scripts to write out the model inputs as Markdown tables, then embed those tables in a larger writeup, also written in Markdown. Pandoc turned it all into a Word document. Thus, the same toolchain that did the actual work, also drove the final report. I really don't think we could have had confidence all the tabular data was right, had it not been automated through Pandoc.
I would like to start using Pandoc in my commercial software [1] to help convert documents into different formats, but the GPL license makes that difficult (or at least confusing.) I think it's generally fine to call a GPL program from a SaaS application. I believe it's fine as long as it is providing an optional or tangential feature, and your application can continue to perform the core functions when that GPL tool is not present. AGPL licenses go a step further and prevent access to any AGPL commands over the network, so that's when a commercial license is always required.
Am I allowed to distribute GPL programs contained inside a Docker image for on-premise installations? Do I just need to provide proper credit and a link to the source code?
Or is there a commercial license available for Pandoc? (I couldn't find anything.)
UPDATE: I've decided to evaluate pandoc and see if it might be useful for supporting Markdown and Word formats, etc. If it is, then I'll reach out to John McFarlane and ask about a commercial license (or just something in writing), perhaps in exchange for sponsorship on GitHub.
As a lawyer -- If you are actively running a commercial enterprise, which you seem to be, these are questions for an attorney in the field. Not me, unfortunately, licenses were never in my area of practice. But you probably want to take the time and bit of cash to make sure you're not potentially opening yourself up to litigation.
> I've decided to evaluate pandoc and see if it might be useful for supporting Markdown and Word formats, etc. If it is, then I'll reach out to John McFarlane and ask about a commercial license (or just something in writing), perhaps in exchange for sponsorship on GitHub.
Better to just use a GPL compatible distribution method: pandoc has 349 contributors; none of them signed a copyright assignment, so you'd need permission from each and every contributor to use the software in a way not permitted by the GPL.
If you need a freelancer with deep pandoc knowledge, please do reach out. I'm happy to help.
You seem to be focused on the intersection of GPL and AGPL code with commercial software which is actually not really relevant other than that you may care more about the legalities under those circumstances. For the GPL, the question is whether your work links in the GPL code. If it merely executes another program in userspace that shouldn't be an issue but you should consult a lawyer if you have serious questions.
I'm not a pandoc user (so far); and have struggled many times in the past with bugs and lacking features in LibreOffice and LaTeX regarding right-to-left text layout and language-specific issues.
My question: How "trustworthy" is pandoc in handling right-to-left content and side-stepping the minefield of target format issues involving such content? Is this subject getting explicit attention from maintainers?
Pandoc should be usable for users of all languages and scripts. It is possible to define the documents language via the `lang` metadata field; `ltr` and `rtl` attributes can be set for individual text elements.
Core contributors are westerners or Russian (US, UK, Switzerland, Germany, Russia), and we rely heavily on user reports to improve non-LTR scripts and languages. But the goal is to make pandoc work flawlessly for everyone.
I have used Xe(La)TeX and the bidi package for mixed rtl and ltr script documents. I don't recall any problems with that. There's also a polyglossia package, but I have less experience with that.
There seem to be not so many haskell applications that succeed to the point where they are of general use, as in not simply useful to programmers doing programming (probably in Haskell) At least this is a frequent observation about Haskell and one I've made myself. https://news.ycombinator.com/item?id=11907839 Obviously around here the ideal is we keep language wars/boosterism/accusations of being a virus etc out if it (Hey I /like/ Haskell, I've just found it useful for my brain rather than being especially useful for performing data transformations that come my way).
/If/ you accept that premise, why do you think Pandoc has been so very successful where perhaps other applications written in haskell have not? The Problem domain (something about writing parsers)? The contributors? The culture? Something else entirely?
Of course if you reject that premise I'd also be interested to hear your thoughts on it in as much detail as you care to provide.
But there still may be some truth to the claim. A simple fact is that smaller mind share -> fewer programs -> less chance for extremely successful projects. From personal experience: it took me three tries and multiple months to get comfortable enough with Haskell to the point that I was able to write my first contribution to pandoc (the org-mode parser), despite having dabbled in functional-style Lisp for years before that. But Haskell, as used by pandoc, isn't difficult. In fact, I often find it easier to use Haskell, thanks to its excellent type system. It's just very different and requires a bit more investment up front, with huge benefits lurking down the road.
Data to support my claim that Haskell is actually easy to use: over 300 people have contributed to pandoc, with over 100 contributing Haskell code. Many of those contributors have never written any Haskell before, but the type system helped them to find their way.
Just to address the premise with the data in the link you provide. Click your link, remove anything that is a compiler, a linter, some other parser of programming languages, a library for use when programming haskell or a programming framework and that list gets very, very dramatically shorter.
I don't think that's entirely fair fwiw, it's github ordered by stars, that will turn up things used by programmers for programming in any language. But either way I don't find the refutation convincing.
I'd love it if the premise was no longer fair. That the data really does not support it. I want monad tutorials, there are thousands. That is no exaggeration. I want Haskell applications useful for something that isn't programming a computer - really not much.
I was kind of hoping you'd say something about the parsing problem domain and why that /seems/ to work particularly well with haskell but other domains not quite so much, at least yet, and whether that can be changed or is simply the nature of statically typed, pure functional programming languages (I really hope not).
It's not "successful" let alone "extremely successful" programs so much as "existant" that is the bar that needs clearing first.
Pandoc is great. Haskell works well for those of you hacking on it. I've used it, liked it and thank you for it!
It isn't necessary to have an opinion on the topic at all, of course.
Thank You for the ever improving org-mode parser. Org-mode is in general difficult since it's a bit of a moving target, so I'm surprised that it's so well supported!
Not sure if I'll ever find the time, but I'd like to make the org-parser less useful for Emacs users. The idea is to write an org exporter which produces pandoc's AST JSON format; all Emacs Org settings would be respected that way, the detour through pandoc's parser would no longer be necessary, and remaining parser incompatibilities wouldn't matter for users exporting from Emacs through pandoc. Well, some day...
That will be great. Org’s greatest power it’s also a weakness – coupling with Emacs. I mean it’s great in all aspects except getting other people to use it.
Pandocs makes it possible/bearable to interact with rest of the world (I’m in the process of moving more things to org).
Being able to export directly to pandoc’s AST Json will probably allow to avoid using other programs to edit content at all! I’ll wait for this day to come; perhaps I’ll even learn enough Elisp to contribute untill then. ;)
> There seem to be not so many haskell applications that succeed to the point where they are of general use, as in not simply useful to programmers doing programming (probably in Haskell) At least this is a frequent observation about Haskell and one I've made myself.
Yes, repeatedly, and I'd love to know why you think it matters and what it is indicative of!
Are there any filters/plugins that could create a good workflow for converting a pdf that is multiple pages of very clear text images? Think of each page having a few printed multiple choice questions. Is there an easy way to get it into a text document?
Some command (or commands) that can be wrapped in a script:
I presume you mean a proprietary license. Probably yes, you just have to obey the license. The Linux kernel and git are also GPL. In general, if you're not linking it into your software you're fine, but see the license for details.
Pandoc is licensed under the GPL version 2 or later. I know of a couple of companies where pandoc is used in proprietary systems server-side. IANAL, so best to consult one for your specific use case.
Pandoc is a tool used daily by those of us who write code notebooks (rmd or jupyter) or are into using markdown for their notes and occasionally need to print said notes. It is hard to overstate how useful Pandoc is for me.
I would bet many people who use Pandoc have no idea they rely on it. I don't think Jupyter or RStudio make a big fuss about it even though they both use it.
I’m a big fan of keeping md documents in source control, then publishing them wherever they need to go in the CI/CD pipeline, and I’ve used pandoc a lot for that.
I always ponder whether it’s the most practically useful Haskell tool ever written.
This is great to know. I use markdown for journaling, note taking, and documentation. I don't need to print anything but if I did then I'd probably go the way of mardown to html with custom css - now I will give pandoc a try first.
Probably overkill, but I use Pandoc to generate tailored resumes for roles and jobs I’m interested in.
I keep a list of all my skills, experience and education in a YAML file and have a LaTeX template that I clone when creating a new resume. Then it’s just a matter of replacing the template fields with YAML metadata and running Pandoc.
I have the same set up to generate both my resume and my website using an HTML template. Makes it easy to update one YAML file and update both my CV and my personal website
The man page is a very nice touch! Do you have source in GH or elsewhere about this harness? I am using Restructured text and rst2pdf but this looks so much nicer!
I also use pandoc to generate CVs, happy to know I'm not alone :) I don't do anything as sophisticated as you, but my main resume is in markdown so I use it to create a .pdf or word doc and to apply .css styling where appropriate.
You can write filters in Python and several other languages. These let you perform arbitrary computation triggered by tags in your source document, and let you extend Pandoc’s Markdown to include your own custom tags to do anything you can imagine.
Here is an article where I show how to use Panflute, a library that lets you write filters in Python, and how I wrote a set of filters to automate the tedious parts of writing a complex technical manual:
Pandoc is awesome! One of my favorite usecases is for Orger [0], which I'm using to automatically convert data from different services into org-mode for easier local-first/offline search, navigation etc. Often API would give you markdown (e.g. Github), and while I could embed a markdown source block in org-mode, with Pandoc I can just convert it and display in native Org syntax.
If you want to do single-file conversions with Pandoc without having to install it, try http://markup.rocks/. It’s a compilation of Pandoc into 2.2MB of JavaScript so you can convert documents (and preview their HTML conversion) in your browser as you type. Its source code: https://github.com/osener/markup.rocks.
I most often use http://markup.rocks/ for converting HTML to Markdown and for testing that my reStructuredText syntax is correct when contributing to docs.
Pandoc also has a demo web page for trying it out (https://pandoc.org/try/). The demo supports all of Pandoc’s formats and doesn’t require a large JS download, but it silently truncates inputs to 3,000 characters.
I haven't updated markup.rocks in 5 years, glad to hear it is still useful for others! Reminds me to update Pandoc and switch to https, likely sometime next month. Maybe I can try compiling it to wasm instead of JS this time around.
Let me know if there's anything you'd like to see that would make it more useful for you!
Pandoc is on the the programs that always surprises me with how good it is. Everything I throw at it works perfectly. I write my assignments for class as Markdown or plain text and it easily makes them a good looking Word or LaTeX document seamlessly.
It's also fantastic for converting my class notes from Markdown with LaTeX equations into beautiful PDFs.
Pandoc is a true work of art. Everything about it embodies the Unix philosophy of "Do One Thing and Do It Well".
I've been using Pandoc (and make) daily for over 6 years for all sorts of document writing (letter, report, thesis, design doc, performance review, you name it) and solve the occasional "interesting" format conversion problem. Its robust, reliable, fast, and a pleasure to use (and script).
I'm in college, and my profs send a lot of .docx files. In general I prefer not to start up libreoffice, so I just use a script and mailcap file to view it automatically with pandoc and zathura. I also use it to write for both assignments and personal stuff, though for anything long or with weird formatting I prefer Latex.
Pandoc works great as a high-level wrapper around latex, where you can write the content in highly-readable markdown, while adding embedded latex for more complex stuff. Being able to use BibTex instead of MSWord's god-awful reference system for footnotes was an eye-opener, as was the ability to keep your manuscripts in text-based .md and .tex formats instead of docx, so you can track your revisions with git.
If you do I highly recommend looking into using a reference doc. I struggled to make the markdown -> docx conversion until I set a few reference docs up to keep consistent style.
pandoc is one of the few packages (among with tetex) i black listed on my distribution for automatic updates because it seems to pull in hundreds of other packages which are not used by anything else.
I don't know how they did it, but somehow they put dependency hell on a completely new level.
Yes i'm sure it's a great tool, but there's a limit how much bloat I can tolerate for a single program.
That would seem that your distro is statically linking all the Haskell libraries. On distros that use dynamic linking for everything, it's also going to pull in (directly or indirectly) ~130 Haskell libraries.
This has little do with pandoc and everything with how awfully Haskell packages are packaged for some distros. Imagine if installing a program that runs on node would pull in every single npm dependency as its own package.
The Arch (and some other linux maintainers) have made the decision to package all Haskell libraries as separate OS packages and install those as dependencies when you install, say, pandoc. This model of distribution doesn't really make much sense for distributing Haskell binaries, though.
There's a few reasons for this: 1) since most people don't have many Haskell binaries and the few that people use don't share many libraries, 2) Haskell packages are normally statically linked when building executables.
If linux maintainers would simply build/ship pandoc as a single static executable all these issues disappear.
> That's the point though: you should only need one package manager.
That's orthogonal to the issue. Even with just one package manager, how packages are created and maintained is a separate task.
So, if the pandoc package has dependencies vendored, dependency hell is avoided regardless of which package manager is used to install it.
If, however, the pandoc package has all dependencies listed as separate packages, dependency hell is created, again regardless of which package manager is used to install it.
Does that actually matter? To me dependency hell is when you have lots of conflicts where some software requires one version but some other software needs a different version. So you can't upgrade one version without breaking something else.
With pandoc and all the haskell dependencies, the only downside is the length of the list of packages when you upgrade. If it was all bundled up as haskell-all I doubt I'd even notice.
That's probably because it's compiled dynamically in your distro's package manager. If you look for a statically compiled option, it might be more to your taste.
And there are statically-compiled versions available for multiple platforms on Pandoc's download page. (I tend to use those for the Mac, rather than installing through Homebrew.)
Arch, I presume? That's mostly due to a man-power problem on the side of the Arch Haskell maintainers. Try our pandoc Docker images or use pandoc-bin from AUR for a bloat-less version. https://hub.docker.com/u/pandoc
Considering what pandoc does and how it is used, docker is a massive overkill imho. What pandoc should actually do, is come as a tar ball and be buildable the traditional configure make make install way like all unix tools of a similar fashion do. Haskell, atm, is no language for this.
Hahaha, that's actually some quality and funny trolling. Not bad :D
For everybody interested in alternative installation methods: all pandoc releases are available as statically compiled binaries for Linux, and via installers on macOS and Windows. Any major package managers ship a more-or-less recent version of pandoc. Compiling is as simple as getting the "stack" tool and running `stack install`.
Pandoc is great but I think it falls a bit short of being a Swiss army knife; there are a lot of conversions it cannot do, like PDF-to-anything. Thankfully Calibre's 'ebook-convert' tool covers many of pandoc's blindspots.
But real Swiss army knife does not include any magic either - even simply extracting text from PDF (ignoring all formatting) is completely non-trivial. Do not know any (non-magical) specialized tool that can convert PDF formatting.
Exactly, Pandoc chooses robustness over buggy half baked conversion. Swiss Army Knife is no good when you need to a debone a Tuna. Every tool on a Swiss Army Knife is sub-optimal. It's a terrible popular analogy in general.
I don't know much about tuna, but I once cleaned a bass with my swiss army knife.
Anyway, I don't really expect Pandoc to do everything, but when you have both Calibre and Pandoc in your toolbox, it sure feels like you could manage close to anything.
Calibre (`ebook-convert`) makes a decent attempt at converting PDFs to other formats. This of course is very far from perfect, but it takes a good stab at it and I've sometimes found the results to be usable (often with some manual cleanup.)
Another example where Caliber compliments Pandoc well is when generating ebooks for sideloading onto kindles. Pandoc can create epubs which Calibre can in turn convert to mobi.
Great thing about Pandoc - it has a clear, descriptive and yet unique name that aptly describes what it does.
That aside, I find the markdown + additional features (e.g. latex math, inline code eval), mainly as implemented in Rstudio and Rmarkdown, to be the sweet spot of power and convenience of typing and legibility in plain text form. Thanks pandoc!
I've been using pandoc a lot recently for converting DRM free epubs into plain text and then piping that into Mac's say command generally then I pipe that to ffmpeg and output the file to mp3 for compressions sake. say is a text-to-speech program. Obviously I only use the audio output for myself. But, I find mac's Books app useful for the audio because you can set the speed up to 2x the original. (I'm sure the say command also has some similar settings too.) I even set up my own Automator task to do most the work for me. I am so thankful to those who made pandoc though it has come in handy time and time again. I used it for tons of my school papers back when I was in school and now it's my go to document converter.
EDIT:
I've also used this workflow for reading RFCs for OAuth and such. It's just basically a small curl piped to say away. Sometimes if I feel like reading an article I'll add a readability like cli tool piped between the curl and say commands. Unix is awesome!
A lot of tech book publishers actually release their books from their own websites in DRM free formats like epub, such as Manning, No Starch Press and often O'Reilly if you get them from the right place (humblebundle.com generally is a pretty good source for that if you're patient.) Sadly O'Reilly's website has stopped selling books directly from their website and instead you have to get them from somewhere else (but before they were DRM free).
I've self-published a couple of paperback novels that I create using LaTeX, then I run them through pandoc to get a perfectly formatted .epub that I use to sell the e-book versions.
I'm using pandoc for generating pdf/epub ebooks from GitHub style markdown. The default output is good enough and there are various themes that can be selected. But I wanted to customize a lot of things like chapter breaks, background color for inline code, bullet styles, blockquote style, etc. I didn't know Latex but was able to find snippets from stackexchange sites to suit my needs. I wrote a blog post on this: https://learnbyexample.github.io/customizing-pandoc/
I absolutely love Pandoc, I use it in my Makefile based static site generator. Pandoc is probably one of the most valuable pieces of open source tooling next to ffmpeg and imagemagick.
Pandoc for text, ffmpeg for audio/video and imagemagick for images?
I've used pandoc for pdf generation and ffmpeg for some audio recording/encoding/playback. I can't imagine what I would use imagemagick by itself for though (that I wouldn't use some common image processing application for). What do you use imagemagick to do?
Hadn't heard of pandoc before. Momentarily thought it converted from PDF to anything, and my heart leapt. Alas, it only converts to PDF. My hopes dashed...
That's not really a reasonable expectation, as PDF is and output format not an input format. If you want to make a PDF that others can read, the best solution is to generate a PDF that embeds the original input. LibreOffice can do this.
Yes, it is the only program coded in Haskell I have ever used for anything practical, to my knowledge.
I have heard of others, like git-annex, but not used them myself. I wonder if there are any I just didn't know were.
I also wonder if anything about Haskell makes it particularly suited as the implementation language for Pandoc. It must have a lot of parsers in it, and Haskell is supposed to be good for coding parsers.
There are parser generation libraries and meta-libraries for certain other languages, notably C++. I wonder what Pandoc in C++ would look like. Probably a pretty good parser meta-library could be spun out of such a project.
If this list rounds up the most-used Haskell programs, I can safely conclude that I don't use any Haskell program besides Pandoc.
Apparently I use a few Go programs--Docker, maybe others?--but no Java programs at all, because I delete all the JVMs from my machines without noticeable effect. Likewise, no C# programs, because I have no Mono runtime. Probably no Lisp, Smalltalk, Julia, or OCaml. Some things I run almost certainly are or use Lua, and of course Python, Perl, and even Tcl. I don't know of any in Rust, but it would be hard to tell because of static linking.
I used pandoc with filters written in Haskell for my blog. I was surprised how far I could stretch it before I had to switch to Rust with pulldown-cmark (just went for Rust for learning although it turned out to be a good decision).
Pandoc filters allowed me to transform the AST in useful ways. For example I turned the image tag into HTML figures with captions, used the video tag if the URL was a video, and called ffmpeg to encode the video in another format for browsers that didn't support the other format.
Pandoc is wonderful. I don’t use it often, but I always have it installed and available.
+1 for being written in Haskell, indeed way back when I became interested in Haskell, I think it was noticing that this tool I was using was written in a strange programming language that influenced me to eventually adopted it many side projects and to write a little book on.
As much as i like pandoc, i hate how many Haskell dependencies it has on archlinux. And the distro is not to blame here. They do it right. In that sense pandoc might be an excellent tool, but for me it's also a reason to think twice whenever you want to use haskell in production. Because apparently, this is a haskell ecosystem issue.
This is very much an Arch issue. The publicly available debian/fedora pandoc packages are statically linked, and, until two years ago, so was the Arch Linux pandoc package. The change to dynamic linking (and therefore 700+ MB of Haskell-related dependencies) was a deliberate decision made at the time to reduce maintainer burden. A statically linked pandoc is still available on the AUR under the name pandoc-bin.
This is probably a silly question, but the last (and first) time I used pandoc, my conversion of org files to markdown resulted in a lot of whitespace within the document itself. I followed the instructions on the website, but is there a flag that I should have used to get rid of excess whitespace?
I'm the author of pandoc's org-mode parser. Can you drop me a mail (listed on my GitHub profile <https://github.com/tarleb>) or post to the pandoc-discuss mailing list?
However it's not quite done, yet. I'm mostly interested in PDF output, and not having LaTeX was one of the goals, so I use weasyprint for PDF generation. Too bad they are very slow with releases, and I encountered many bugs...
It surprised me when I couldn't find a decent tool to read markdown in a shell and I tried about a dozen tools but pandoc did it the best to read it sufficiently well by feeding it into man command.
Does anyone have practical experience maintaining an entire website through pandoc generated HTML? Is it worth it, and what are some pitfalls to be aware of?
That's how I generate my website (and eveything else). There really are no pitfalls. Whenever something is not working, I discover that the answer is in the official Pandoc manual. I suggest getting a recent Pandoc; the version in your package manager may be a bit old.
And with hakyll, you get a static site generator powered by all the goodness that is pandoc. Blazingly fast (compared to say, pelican) and easy to extend.
There's a popular template [0] which you can adapt to your needs. I didn't know Latex too, so I cobbled together snippets I found from stackexchange sites [1] (this was before I knew about that template, else I'd have probably started with that)
I have installed the latest texlive in home directory.
When I invoke 'sudo apt install pandoc' it requires me to install a massive texlive setup at the system level as part of it.
This is not specific to pandoc but many other packages. I have anaconda3 installed in my home, but image-magick requires a massive numpy/scipy system-level install (ignoring for the moment my bewilderment at why would image-magick require numpy/scipy).
You're asking the system to install a package. System packages are available to all users. If the package is going to work for all users, its dependencies also need to be available to all users. This naturally leads to what you're seeing: the system will not consider software installed only for your user, so it'll end up installing the same dependency system-wide that you had installed in your home directory. While I understand your frustration, I can't immediately think of a better way to handle this.
Considering what you get for 1G it is worth it for most users. I would guess that you aren't the target audience for it if you're that concerned over space. 1G of space these days is nothing unless you're using an older system. It's just sitting on your disk and that takes nothing away from you if you aren't loading it. It handles 10s of file types and that requires a lot of libraries
In Ubuntu and Debian the dependency from pandoc to texlive is of the "suggests" type, not "required". So you do not have to install texlive to use pandoc. You may use an interactive front-end like aptitude and simply deselect all the suggested dependencies you don't care about (or configure aptitude not to install suggested deps by default).
I think these packages contains pdf as well which makes the whole texlive installation over 1gb. Even without pdf, texlive is pretty big. I don't think there is a way around it. You can use a docker image to isolate pandoc from the system.