Hacker News new | past | comments | ask | show | jobs | submit login
M4 as a templating language (2020) (chrisman.github.io)
136 points by todsacerdoti on July 29, 2021 | hide | past | favorite | 112 comments



It's worth noting that M4 was invented by Brian Kernighan and Dennis Ritchie. GNU M4 might be what you're using, but it is definitely not the original version, and you probably shouldn't attribute M4 to GNU any more than you do C because they made GCC.


"Invented" is a bit of a strong word here: there were "M" and "m3" macro processors before m4. Christopher Stratcheys GPM (General Purpose Macroprocessor, https://academic.oup.com/comjnl/article/8/3/225/336044) was the first macro processor where macros can occur anywhere in the source text - just like in m4. That makes it the first general such program, and not just the macro processor for some specific programming language (like the preprocessor of a macro assembler)


doesn't help that gnu manual pages & documentation tend to completely ignore the origins of the tool they are emulating

(compare HISTORY sections of BSD userland vs GNU coreutils ones as one simple example)


The GNU M4 manual has a sizable section acknowledging its precursors: https://www.gnu.org/software/m4/manual/m4.html#History


perhaps the full manual, but the manual page (which is what I specifically mentioned) has quite the opposite:

    $ man gm4 |sed -ne 119,121p
    AUTHOR
           Written by Rene' Seindal.


Part of the point of GNU was to be a fresh implementation with no copyright dependency on Unix.


That doesn’t preclude acknowledging the history of the tool.


... which would be essentially plagiarism in other fields.


M4 has some well-known uses like Sendmail and GNU Autoconf, but I don’t think this can be counted in M4’s favor: For operating in a standard Unix-like environment, there is really no good alternative to m4 for generic text macro expansion. You can try to force cpp – the C preprocessor – into this, but it is really only suited to being part of a C compiler pipeline. For better or for worse, m4 is it in the world of Unix and shell tools, even though its syntax is quite annoying to use.

Of course, if you operate within a language ecosystem, you can probably find a nice templating system there. Python, for instance, has Jinja2. Or, if your needs are simple, you can use something more basic, like sed(1).

(Repost: https://news.ycombinator.com/item?id=22770406)


The Mustache template language has a bash version that is good for simple stuff: https://github.com/tests-always-included/mo


Seconding `mo`. It's pretty nice


There are a couple of tools that try to package Jinja2 (and probably also other templating engines like go-templates) in a CLI friendly way:

- https://github.com/M3t0r/tpl (mine)

- https://github.com/kblomqvist/yasha

- https://github.com/mattrobenolt/jinja2-cli


It seems to me that a combination of sed and simple shell scripts would be much better than this. The syntax is really not friendly.


M4 syntax looks that way because it has to work for arbitrary text files.

Quotes can be redefined to an arbitrary string

   changequote(`[[[', `]]]')
And some editor give you syntax highlighting.


> M4 syntax looks that way because it has to work for arbitrary text files.

That seems like an odd assertion given the existence of other templating systems which have friendlier syntax while being capable of generating arbitrary output. Is there a reason why you think on the M4 approach is valid?


Most templating systems work in combination with an external program written in a high-level language, e.g. Ruby for Liquid or Python for jinja2. m4 does not.


And m4 works in combination with a bunch of C code. Virtually any system with m4 can come with an interpreter for a higher level language without running out of space (I can't imagine many uses for m4 at runtime on systems where, say, a perl interpreter wouldn't fit). Why is relying on such a runtime a problem?


No, m4 is written in C but it is autonomous. Liquid needs something like Jekyll around it.


What does “autonomous” mean? There are a ton of template engines that don’t depend on a parent framework, there is no reason you can’t implement all features in C.

Something like mustache for example, which has a C implementation that includes support for includes/partials: http://mustache.github.io/mustache.5.html


Jinja2 is an API and it needs a Python program to determine the values for the substitution variable and invoke the template engine. In contrast the M4 executable is the engine.


I use PHP when I need non-trivial templating. Always struck me as odd that people would invent/use their own templating engines (think HCL, jinja, smarty, m4). All of these are implementable as either a PHP file or a haskell interface


Can anyone explain how m4 is different to tr?


I can.

    tr translates letters, m4 translates words and simple expressions.


envsubst is forgotten one.


The context I know M4 from is generating `configure` scripts that generate `Makefile`s that generate programs from a single multi-platform source tree. I can only offer my utmost respect to those who master that multi-level complexity (some call it “hell”).


> I can only offer my utmost respect to those who master that multi-level complexity (some call it “hell”).

I also have massive respect for those people. I only understood the point of autoconf when I ended up reimplementing a fraction of it in GNU Make macros.


For simple templating I use my own version[1] of pp[2] preprocessor. The idea behind it is ridiculously simple: everything between ^#!$ markers is shell script. Output of the script is pasted verbatim in the document.

[1]: https://github.com/TeddyDD/pp.awk [2]: https://www.mkws.sh/pp.html


That's pretty cool.

I wrote something similar in my static collection of sysadmin tools - https://github.com/skx/sysbox - In my simple pre-processor I only allow two special things:

#include "file/goes/here" #execute ls -l | wc -l

Though there is a more complex version included, which supports all the syntax of the golang text/template library which is more powerful.


Why not just use a shell script and output using cat<<EOF

EOF


This comes to mind:

A Generation Lost in the Bazaar https://queue.acm.org/detail.cfm?id=2349257

(It's mostly a criticism of building tools, and MR is mentioned because of autoconf)


I needed a simple templating tool recently and thought about m4, but found myself to be too busy to dive into it deeply enough. In the end, I just used a Bash script with a heredoc, using jq/yq to read from JSON/YAML. A poor man's Jinja2:

    #!/usr/bin/env bash

    declare SOMETHING="$(yq eval ".yaml.query" whatever.yml)"

    cat <<EOF
    Here is my template. It says ${SOMETHING}.
    EOF
It's not pretty, but it works! You can obviously shell out to anything in the heredoc, so it's quite flexible. Imports that respect the variables might be a bit tricky; maybe just have them as separate scripts like the above and use environment variables to to pass information around would be easiest.


I had the... pleasure of meeting M4 while working on GNU Bison (a grammar parser generator): after parsing the grammar, it calls M4 to generate the parser (in C, C++, Java, D or other), defining some variables.

So you get a giant c/c++ file with a parser implementation made of M4 definitions, C preprocessor definitions, includes of generated (lookup tables) or user-written (custom parsing actions) files, and it's absolute hell to debug :D


The same story with flex.


m4 is pretty neat, but debugging it is very difficult. It takes a long time to understand when a macro is expanded and how quoting changes that. It could also use a standard library of convenient string manipulation functions, perhaps a dictionary data structure. But at that point you can probably just switch to a more featureful programming language.


... your mother should have warned you about the evils of macro expansion languages. — Leslie Lamport (TeXhax, 1988)


> Gross, right? m4 is great for macros and includes. Not super fun for general programming. But, like immigrants, it gets the job done.

Weird comparison.


I think it’s a reference to the line in the Hamilton song.


This was right after m4 was called gross and was said only to be good for special purposes, that it was to be avoided in general and better alternatives should be used instead. Then it says m4 is like immigrants. If it's a song reference, it's a rather poorly chosen one.


As a European reading this it sounds like the author has some kind of narcissistic take on exactly what kind of immigrants their country has and where they "belong".

Of course what the author wrote, and what I understood needn't align, we're different, with different cultural references, but this stood out to me enough that I stopped reading to revisit HN to find out if anyone else observed the same casual use of what seems like inappropriately dismissive language.


This is yeah, probably a cultural mixup. There's a fair number of people in the US who view and characterize immigrants as freeloaders, when the truth is they work harder than "natives" and our economy depends on them. And like a sibling comment says, it's a reference to a line from Hamilton, which celebrates in lots of ways the US' diverse heritage.


I worked on a macro preprocessor circa 1999 which could have developed into a M4 replacement.

https://web.archive.org/web/20070209223522/http://users.foot...

Then I discovered Lisp so I probably lost interest in MPP because of that.

One distguishing feature of MPP is that it preserves indentation when generating code. If you have a macro which generates multiple lines of code, and that macro call is indented, then its output gets indented. So in theory you can use this for Python and Yaml and their ilk.

Interestingly, without any prior Lisp knowledge, I based this project on nested list processing. It has programmable read-tables, like Common Lisp, and also lexically scoped local macros, like macrolet. Lisp showed me a vastly better designed architecture of all the infrastructure that I greenspunned inside MPP, which made it seem pointless to continue that project, and pour more effort into ramping up on Lisp.

Probably for the same reason, I never migrated that mpp.html page and its sub-content to my newer home page locations. I have all the code in version control and all. I'm always having to refer to it via archive.org.

I posted about MPP in the comp.lang.c newsgroup in 2008, when John Thingstad started a thread with my name in the subject line, asking:

I have noticed you have become dramatically better in Lisp in a relative short period of time. What is you "secret"?

https://groups.google.com/g/comp.lang.lisp/c/xvuivrEnQ9s/m/z...

There is a deleted article, which I think is my first response to the question, which I replaced using the Usenet supersede feature.


In 2014 I worked for a food delivery startup as sysadmin (pizza.de, now sold at least twice).

We used a bunch of shellscripts and m4 to manage a lot of linux servers and config-files in a similiar fashion to what ansible does (which was not really a thing at the time).

It worked ok, but I hate the m4 syntax to this day.


Neat post, but I was confused about how they said they can't use pandoc, but they never explained in the post how they convert the markdown portion of their files to HTML.

Then I read their makefile on github and they're still using pandoc...

https://github.com/chrisman/chrisman.github.io/blob/a2a949f3...


> m4 is a core gnu utility

Actually, m4 is specified as part of the Single Unix Specification (IEEE Std 1003.1), GNU m4 being "only" one implementation of it, the original one supposedly being part of System V and published as part of OpenSolaris a couple years back (sadly, can't verify nor link to it as those resources seem to have vanished or placed behind paywalls/registrations).

m4 was heavily used (and I guess still is) for sendmail and DNS (BIND) configs. Don't know if it is actually usable as a grand unified mail config system over sendmail, postfix, and other SMTP servers, but in principal it could be.

For markup templating and static or dynamic site building, including HTML email templating, there are of course any number of much more specific tools as HNers will know. One that should appeal to m4 fans in that it's based on standards, and kindof is the least surprising giving HTML's history, is mighty SGML itself with full HTML-aware, injection-free templating.

Edit: and m4 is prominently used by autotools of course


Once upon a time m4 had value as a guaranteed-available tool on Unix systems. But then Red Hat and Debian removed it from base systems, despite it being classified in POSIX as a mandatory shell utility, as opposed to utilities like c99, which is part of the Development Utilities extension set.

If you have to rely on the downstream user installing additional dependencies, that substantially changes the equation. You can either have them install a better tool, or remove the external dependency altogether. I personally think m4 is elegant, but most programmers seem to struggle with recursive string interpolation, at least on a practical level if not a conceptual level. And complex m4 programming quickly devolves to dynamic code generation, which is even more hostile. So I've learned to avoid it.

Autotools was prescient in this regard (or perhaps usefully conformant m4 implementations were spotty) as ./configure scripts rely on sed instead of m4 for templating, including for Makefile generation. While heavily used by autotools, m4 is used to generate ./configure, Makefile.in, and similar inputs that are statically included in the release rather than generated downstream.


Having used m4 in practice, it's easier to get the job done with a custom python script.


Having operated infrastructure that predated most modern templating systems, I remember with some amusement m4 based routing config generation that required recursive macros taking many minutes and gb of memory because there is no way to pass by reference or lazy evaluate.


More pythonbell!


M4 promised so much. There was CP/M-version too. But it was like Prolog, after one page of code you lost all control and comprehension. And enthusiasm.


I'm interested in learning about prolog. Where can I learn about the problems with it?


The problem with prolog is that it is incomplete. Everytime I write this, prolog-fanboys become angry and start spewing out their 5-line "simple solutions":

    fib(X,Y+Z) :- fib(X-1,Y),fib(X-2,Z)


it speaks very poorly of the unix ecosystem that nothing to replace m4 with a more sensible syntax has become widely available.

If the unix CLI isn't going to slowly rot away, it needs to pick up improved versions of tools like this and start getting them into wide distribution.


I agree, something better has to be possible. Maybe it could begin with some simplifying assumptions to reduce the complexity of quoting.


I used it 25 years ago (!) to build a tool for provisioning servers from a configuration file, i.e. generating files in /etc using macro substitution. I'm sure the kids are doing this more elegantly today, but it sure saved us a lot of time!


Seconding other comments - assuming you can use Python, use jinja templates, slap on a twenty-line Python wrapper script to read your data and then apply it to the templates, and then you've saved yourself of days of headache.


A nice demonstration of what M4 can do is M4BASIC: http://www.basic-converter.org/m4basic/.


Is there any large body of public C code that uses/used m4? I'd really like to see what the m4 equivalent of the original Bourne shell code or the APL Incunabulum would be.


We in BIRD (https://bird.network.cz/) use m4 code (https://gitlab.nic.cz/labs/bird/-/blob/master/filter/decl.m4) to implement bytecode interpreter for our filtering language.


While Turing complete, and not merely in a highly technical, obtuse way (e.g. C++ templates), m4 nonetheless doesn't work well with dynamic or arbitrary inputs. I'm sure you can make it work using extensions like GNU Make's esyscmd and some preprocessing, but it would be ugly and even more impenetrable than typical recursive string interpolation code. To reimplement complex software you'd probably end up implementing a tiny VM in m4 and then targeting the VM. Though, I suppose m4 is far more accommodating in this regard than, e.g., ed or sed.

IOW, m4 is really only suited for fairly straight-forward static input-output transformations, such as common templating tasks. These transformations can themselves become quite complex and even convenient to implement (put your functional programming hat on), but once you stray from recursive string interpolation on lists of simple tokens sourced from statically defined input the ergonomics break down immediately.


The closest equivalent for macro processors is probably ML/I[0]. It’s implemented in macros[1]; you port it by defining macros that map it to a high level language or assembly language. William Waite’s Stage2[2], Robert Dewar’s Macro SPITBOL[3], and Macro SNOBOL4[4] are some other examples of systems implemented in macros.

[0] http://www.ml1.org.uk/index.html

[1] http://www.ml1.org.uk/implementation.html

[2] https://github.com/crandylb/stage2-1

[3] https://github.com/spitbol

[4] https://www.snobol4.org/csnobol4/curr/


How large does that body have to be? Autoconf consists of M4 macros, so all software that uses Autoconf uses m4.


m4 is used heavily in application build systems.


During my CS degree in 1988 I wrote a simple programming language that macro expanded (using m4) into lambda calculus that was then converted to SK combinators for evaluation.

edit: It was quite impressive just how amazingly inefficient (in time and space) this approach can be (the combinator part, not the m4 part which worked pretty well).


I too at one point suffered from a similar affinity towards 'minimalism'. Nothing good came out of it.

With things like go get and nix you can be more or less independent of package managers or even OS to some extent.


In my experience, the best (and the simplest) templating language is the one that uses the <% and %> brackets. This can be used with any programming language that has quoted string literals.


As a US immigrant, I found this line funny

> m4 is great for macros and includes. Not super fun for general programming. But, like immigrants, it gets the job done.


It's a line from the musical Hamilton. “Immigrants, we get the job done”.


m4... sendmail config <shudder>

Only painful memories.


i have generally found this: http://www.cs.stir.ac.uk/~kjt/research/pdf/expl-m4.pdf to be pretty exhaustive / thorough coverage of m4.


No. Just no.

It's cool that you are comfortable with it (or at least for 90% of what you want it to do) but it is such a unwieldy tool that it gets painful very quickly.

As a curiosity for small projects it's okay. Seeing what's out there and extending one's view of the programming landscape is great.

But if you use it in a professional environment for heaven's sake please think of the maintainers that come after you and just don't use it.


I TA'd a couple assembly courses that used m4 as a preprocessor. Slight syntax errors result in some gnarly error messages, and students really struggled with it. Heck, I struggled with it sometimes.


You call it "unwieldy", other people call it "available everywhere and battle-tested". That we tend to implement slightly more aesthetic alternatives to existing, proven software instead of reusing stuff (and building for reuse), is probably the main reason why complex software is so bad (unstable, difficult to get accustomed with and maintain) nowadays.


A musket is also battle-tested, but it was replaced because it is slow, hard to operate, heavy, and in the end if you tried to continue using one, you lost.

M4 is similar. It's just a macro processor. If you try to make it work on a problem that requires more, it will backfire in uncomfortable ways.


Ah, the joy of cherry-picked analogies. Let's replace the musket with an AK-47, how does it look now?

Yes, M4 is a macro processor. A pretty decent one that is well-known and will not surprise you with incompatible changes or newly introduced bugs. Nobody suggested using it for more.


AK-47 had been superseded by AKM after less than 10 years in service. AKM was, in turn, superseded by AK-74. And AK-74 has been superseded by AK-12. The last transition, in particular, was heavily motivated by ergonomics and difficulty of adapting the original to typical modern circumstances of use.

So, I'd say the analogy still holds up pretty well.


> well-known

Hire twenty developers and ask how many know some M4 vs some Python.


I'd rather have someone learn to use M4 when needed than have "some Python" replace M4 in an important project. But that's just me, other people like building sand castles.


You gotta strive to use a right tool for the job, always. Using m4 as a template processor for HTML isn't it.


> That we tend to implement slightly more aesthetic alternatives to existing, proven software instead of reusing stuff (and building for reuse), is probably the main reason why complex software is so bad (unstable, difficult to get accustomed with and maintain) nowadays.

I don't think it's proven if most of the people that had to work with it complain about the complexity of doing a simple loop. I also don't think it's built for reuse, considering so few people actually use it. If anything, hard to use software like m4 is exactly what pushes everyone to think "I can do better!".


What font is this blog written with?



Right click and Inspect the element and you can see that it's using Fira Mono.


Even if it’s a Hamilton reference, it’s not a great one.


Meh. Most of these classic unix tools aren't particularly good examples of what they do, they're just old enough to have become established. I believe M4 was originally created to simplify the sendmail config file and then eventually pulled out into its own program?

Rather like this writer, I felt the need to make my blog entries markdown with minimum header and footer surrounding them. So I found a one-liner script that renders markdown in the browser. (Anyone using lynx gets the raw markdown instead, but that's probably no less friendly than what lynx could come up with, and also means that you can read it just as well with telnet (or I guess openssl s_client these days) if you don't have a browser around.


> I believe M4 was originally created to simplify the sendmail config file and then eventually pulled out into its own program?

m4 predates sendmail by 6 years (1977, 1983).


IIRC, m4 was initially a preprocessor for Rational Fortran, as someone else said, many years before BSD and sendmail.


The 7th edition manual[1] says m4 was based on m3, which was based on the Software Tools macroprocessor. The source for that is at https://9p.io/cm/cs/who/bwk/toolsbook/index.html

1. e.g. https://wolfram.schneider.org/bsd/7thEdManVol2/m4/m4.pdf


Getting flashbacks to the earliest Unix/C book I had (from 1977 with 2nd ed from 1983 or so) already mentioning ratfor and m4. Never occurred to me that these two are related.


Agreed. A ton of people argue for them, but when you ask people to re-design them starting from scratch today, few people arrive to the design of the Bourne shell, for example.


Any chance you can link that script? I’ve been looking for something like this recently too.


> I found a one-liner script that renders markdown in the browser.

Care to share the one-liner?


I just looked at their page and they are using https://strapdownjs.com/

Not exactly a one liner, but it looks simple to use.


It's a one-liner in the sense that you add that one line script tag to your page and then it works.


fvwm


What's the connection between M4 and a window manager?


The entire purpose of the comment was to remark that there is one, ovbviously.

I didn't say "window manager", I named a specific piece of software by name, in a particular context.

It's a reasonable assumption that there is a significant connection of some sort between m4 and this window manager, or at least that my intention was to declare as much.

And what more should I have to say in this context than "fvwm uses it too"? Do you actually care about the details of exactly how fvwm uses m4? If you did, then googling "fvwm m4" gives you all that.


FVWM has a complex configuration format and back in the day (maybe even now?) some people would use macro processors to generate the configuration file.

In 2021, I wouldn't bother.


I thought the m4 files simply were the config files. This was something like redhad 3.x or 4.x? It was all m4 files to define the start menu contents and the dock and everything else.

I was new enough at the time that I guess those could have just been a front end to the actual config files without understanding that.

I just remember spending a lot of time fiddling with m4 files to get my fvwm and fvwm95 desktop just the way I wanted it.

It's nothing I chose, it's just the way it was already in the distribution.

edit... out of curiosity I've now googled "fvwm m4" and confirmed my memory. I was sort of mostly or half right. m4 is not strictly "the" config format, but it is a compile-time built-in feature that is generally always used, and old redhat in particular had a confusing setup.


Yes -- same with CPP as well. Fvwm2 allowed you to use the language features of M4/CPP to hold variables and to perform loops on different data, so you could generate complex config files that way, without needing to build that into the core config syntax itself.

However, in fvwm3, I've removed both those modules.


From the site:

> But, like immigrants, it gets the job done.

What the hell?


A line from the musical Hamilton. “Immigrants, we get the job done”. Received much cheer in the crowd.


Somebody made a joke. Shock!


It’s a quote from Hamilton.


Are you saying immigrants don't get the job done? :-D


Hamilton reference.


> Pandoc is honestly overkill for just converting markdown to HTML

Strongly disagree. Pandoc is the only solution I could find that outputs static HTML and supports maths. It's a really great solution. And it's pretty fast. I don't know in what way it could be overkill.

Using a hacky M4 script instead of Pandoc is completely insane. Find if you want to do insane things I guess but it's still insane.


The author clearly explained why they weren't able to use pandoc a couple lines before it.

> This arm64 system not only didn’t have pandoc installed, but pandoc wasn’t even in the software repository. I considered compiling it myself, but the language it is written in, Haskel, doesn’t even compile on arm64.

Then they justified it by explaining that their usecase didn't really need pandoc anyway.


It doesn't seem true though? Haskell has had AArch64 releases since at least 2017 - admittedly labelled as experimental, but it does compile.


He probably needs to cross-compile it. pandoc specifically has a history of not building on ARM boxes directly because it runs out of memory during the build.

And GHC support for ARM in general has been spotty. I know pandoc wouldn't build on Apple Silicon last time I tried to install it, but that was a few months ago.

That's a nice thing about C. It'll compile and run on basically anything.


I'm not gonna question it when someone says they ran into issues compiling something on a different architecture. God knows I've had enough obscure problems with it in the past.


And even without native code generator, GHC uses LLVM as a compiler backend for this architecture.


There are Pandoc binary releases for Arm64.

https://github.com/jgm/pandoc/releases

But it looks like this was before that (didn't have the 2020 when I read it).




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: