It's worth noting that M4 was invented by Brian Kernighan and Dennis Ritchie. GNU M4 might be what you're using, but it is definitely not the original version, and you probably shouldn't attribute M4 to GNU any more than you do C because they made GCC.
"Invented" is a bit of a strong word here: there were "M" and "m3" macro processors before m4. Christopher Stratcheys GPM (General Purpose Macroprocessor, https://academic.oup.com/comjnl/article/8/3/225/336044) was the first macro processor where macros can occur anywhere in the source text - just like in m4. That makes it the first general such program, and not just the macro processor for some specific programming language (like the preprocessor of a macro assembler)
M4 has some well-known uses like Sendmail and GNU Autoconf, but I don’t think this can be counted in M4’s favor: For operating in a standard Unix-like environment, there is really no good alternative to m4 for generic text macro expansion. You can try to force cpp – the C preprocessor – into this, but it is really only suited to being part of a C compiler pipeline. For better or for worse, m4 is it in the world of Unix and shell tools, even though its syntax is quite annoying to use.
Of course, if you operate within a language ecosystem, you can probably find a nice templating system there. Python, for instance, has Jinja2. Or, if your needs are simple, you can use something more basic, like sed(1).
> M4 syntax looks that way because it has to work for arbitrary text files.
That seems like an odd assertion given the existence of other templating systems which have friendlier syntax while being capable of generating arbitrary output. Is there a reason why you think on the M4 approach is valid?
Most templating systems work in combination with an external program written in a high-level language, e.g. Ruby for Liquid or Python for jinja2. m4 does not.
And m4 works in combination with a bunch of C code. Virtually any system with m4 can come with an interpreter for a higher level language without running out of space (I can't imagine many uses for m4 at runtime on systems where, say, a perl interpreter wouldn't fit). Why is relying on such a runtime a problem?
What does “autonomous” mean? There are a ton of template engines that don’t depend on a parent framework, there is no reason you can’t implement all features in C.
Jinja2 is an API and it needs a Python program to determine the values for the substitution variable and invoke the template engine. In contrast the M4 executable is the engine.
I use PHP when I need non-trivial templating. Always struck me as odd that people would invent/use their own templating engines (think HCL, jinja, smarty, m4). All of these are implementable as either a PHP file or a haskell interface
The context I know M4 from is generating `configure` scripts that generate `Makefile`s that generate programs from a single multi-platform source tree. I can only offer my utmost respect to those who master that multi-level complexity (some call it “hell”).
> I can only offer my utmost respect to those who master that multi-level complexity (some call it “hell”).
I also have massive respect for those people. I only understood the point of autoconf when I ended up reimplementing a fraction of it in GNU Make macros.
For simple templating I use my own version[1] of pp[2] preprocessor. The idea behind it is ridiculously simple: everything between ^#!$ markers is shell script. Output of the script is pasted verbatim in the document.
I wrote something similar in my static collection of sysadmin tools - https://github.com/skx/sysbox - In my simple pre-processor I only allow two special things:
#include "file/goes/here"
#execute ls -l | wc -l
Though there is a more complex version included, which supports all the syntax of the golang text/template library which is more powerful.
I needed a simple templating tool recently and thought about m4, but found myself to be too busy to dive into it deeply enough. In the end, I just used a Bash script with a heredoc, using jq/yq to read from JSON/YAML. A poor man's Jinja2:
#!/usr/bin/env bash
declare SOMETHING="$(yq eval ".yaml.query" whatever.yml)"
cat <<EOF
Here is my template. It says ${SOMETHING}.
EOF
It's not pretty, but it works! You can obviously shell out to anything in the heredoc, so it's quite flexible. Imports that respect the variables might be a bit tricky; maybe just have them as separate scripts like the above and use environment variables to to pass information around would be easiest.
I had the... pleasure of meeting M4 while working on GNU Bison (a grammar parser generator): after parsing the grammar, it calls M4 to generate the parser (in C, C++, Java, D or other), defining some variables.
So you get a giant c/c++ file with a parser implementation made of M4 definitions, C preprocessor definitions, includes of generated (lookup tables) or user-written (custom parsing actions) files, and it's absolute hell to debug :D
m4 is pretty neat, but debugging it is very difficult. It takes a long time to understand when a macro is expanded and how quoting changes that. It could also use a standard library of convenient string manipulation functions, perhaps a dictionary data structure. But at that point you can probably just switch to a more featureful programming language.
This was right after m4 was called gross and was said only to be good for special purposes, that it was to be avoided in general and better alternatives should be used instead. Then it says m4 is like immigrants. If it's a song reference, it's a rather poorly chosen one.
As a European reading this it sounds like the author has some kind of narcissistic take on exactly what kind of immigrants their country has and where they "belong".
Of course what the author wrote, and what I understood needn't align, we're different, with different cultural references, but this stood out to me enough that I stopped reading to revisit HN to find out if anyone else observed the same casual use of what seems like inappropriately dismissive language.
This is yeah, probably a cultural mixup. There's a fair number of people in the US who view and characterize immigrants as freeloaders, when the truth is they work harder than "natives" and our economy depends on them. And like a sibling comment says, it's a reference to a line from Hamilton, which celebrates in lots of ways the US' diverse heritage.
Then I discovered Lisp so I probably lost interest in MPP because of that.
One distguishing feature of MPP is that it preserves indentation when generating code. If you have a macro which generates multiple lines of code, and that macro call is indented, then its output gets indented. So in theory you can use this for Python and Yaml and their ilk.
Interestingly, without any prior Lisp knowledge, I based this project on nested list processing. It has programmable read-tables, like Common Lisp, and also lexically scoped local macros, like macrolet. Lisp showed me a vastly better designed architecture of all the infrastructure that I greenspunned inside MPP, which made it seem pointless to continue that project, and pour more effort into ramping up on Lisp.
Probably for the same reason, I never migrated that mpp.html page and its sub-content to my newer home page locations. I have all the code in version control and all. I'm always having to refer to it via archive.org.
I posted about MPP in the comp.lang.c newsgroup in 2008, when John Thingstad started a thread with my name in the subject line, asking:
I have noticed you have become dramatically better in Lisp in a relative
short period of time. What is you "secret"?
In 2014 I worked for a food delivery startup as sysadmin (pizza.de, now sold at least twice).
We used a bunch of shellscripts and m4 to manage a lot of linux servers and config-files in a similiar fashion to what ansible does (which was not really a thing at the time).
It worked ok, but I hate the m4 syntax to this day.
Neat post, but I was confused about how they said they can't use pandoc, but they never explained in the post how they convert the markdown portion of their files to HTML.
Then I read their makefile on github and they're still using pandoc...
Actually, m4 is specified as part of the Single Unix Specification (IEEE Std 1003.1), GNU m4 being "only" one implementation of it, the original one supposedly being part of System V and published as part of OpenSolaris a couple years back (sadly, can't verify nor link to it as those resources seem to have vanished or placed behind paywalls/registrations).
m4 was heavily used (and I guess still is) for sendmail and DNS (BIND) configs. Don't know if it is actually usable as a grand unified mail config system over sendmail, postfix, and other SMTP servers, but in principal it could be.
For markup templating and static or dynamic site building, including HTML email templating, there are of course any number of much more specific tools as HNers will know. One that should appeal to m4 fans in that it's based on standards, and kindof is the least surprising giving HTML's history, is mighty SGML itself with full HTML-aware, injection-free templating.
Edit: and m4 is prominently used by autotools of course
Once upon a time m4 had value as a guaranteed-available tool on Unix systems. But then Red Hat and Debian removed it from base systems, despite it being classified in POSIX as a mandatory shell utility, as opposed to utilities like c99, which is part of the Development Utilities extension set.
If you have to rely on the downstream user installing additional dependencies, that substantially changes the equation. You can either have them install a better tool, or remove the external dependency altogether. I personally think m4 is elegant, but most programmers seem to struggle with recursive string interpolation, at least on a practical level if not a conceptual level. And complex m4 programming quickly devolves to dynamic code generation, which is even more hostile. So I've learned to avoid it.
Autotools was prescient in this regard (or perhaps usefully conformant m4 implementations were spotty) as ./configure scripts rely on sed instead of m4 for templating, including for Makefile generation. While heavily used by autotools, m4 is used to generate ./configure, Makefile.in, and similar inputs that are statically included in the release rather than generated downstream.
Having operated infrastructure that predated most modern templating systems, I remember with some amusement m4 based routing config generation that required recursive macros taking many minutes and gb of memory because there is no way to pass by reference or lazy evaluate.
M4 promised so much. There was CP/M-version too. But it was like Prolog, after one page of code you lost all control and comprehension. And enthusiasm.
The problem with prolog is that it is incomplete.
Everytime I write this, prolog-fanboys become angry and start spewing out their 5-line "simple solutions":
it speaks very poorly of the unix ecosystem that nothing to replace m4 with a more sensible syntax has become widely available.
If the unix CLI isn't going to slowly rot away, it needs to pick up improved versions of tools like this and start getting them into wide distribution.
I used it 25 years ago (!) to build a tool for provisioning servers from a configuration file, i.e. generating files in /etc using macro substitution. I'm sure the kids are doing this more elegantly today, but it sure saved us a lot of time!
Seconding other comments - assuming you can use Python, use jinja templates, slap on a twenty-line Python wrapper script to read your data and then apply it to the templates, and then you've saved yourself of days of headache.
Is there any large body of public C code that uses/used m4? I'd really like to see what the m4 equivalent of the original Bourne shell code or the APL Incunabulum would be.
While Turing complete, and not merely in a highly technical, obtuse way (e.g. C++ templates), m4 nonetheless doesn't work well with dynamic or arbitrary inputs. I'm sure you can make it work using extensions like GNU Make's esyscmd and some preprocessing, but it would be ugly and even more impenetrable than typical recursive string interpolation code. To reimplement complex software you'd probably end up implementing a tiny VM in m4 and then targeting the VM. Though, I suppose m4 is far more accommodating in this regard than, e.g., ed or sed.
IOW, m4 is really only suited for fairly straight-forward static input-output transformations, such as common templating tasks. These transformations can themselves become quite complex and even convenient to implement (put your functional programming hat on), but once you stray from recursive string interpolation on lists of simple tokens sourced from statically defined input the ergonomics break down immediately.
The closest equivalent for macro processors is probably ML/I[0]. It’s implemented in macros[1]; you port it by defining macros that map it to a high level language or assembly language. William Waite’s Stage2[2], Robert Dewar’s Macro SPITBOL[3], and Macro SNOBOL4[4] are some other examples of systems implemented in macros.
During my CS degree in 1988 I wrote a simple programming language that macro expanded (using m4) into lambda calculus that was then converted to SK combinators for evaluation.
edit: It was quite impressive just how amazingly inefficient (in time and space) this approach can be (the combinator part, not the m4 part which worked pretty well).
In my experience, the best (and the simplest) templating language is the one that uses the <% and %> brackets. This can be used with any programming language that has quoted string literals.
It's cool that you are comfortable with it (or at least for 90% of what you want it to do) but it is such a unwieldy tool that it gets painful very quickly.
As a curiosity for small projects it's okay. Seeing what's out there and extending one's view of the programming landscape is great.
But if you use it in a professional environment for heaven's sake please think of the maintainers that come after you and just don't use it.
I TA'd a couple assembly courses that used m4 as a preprocessor. Slight syntax errors result in some gnarly error messages, and students really struggled with it. Heck, I struggled with it sometimes.
You call it "unwieldy", other people call it "available everywhere and battle-tested". That we tend to implement slightly more aesthetic alternatives to existing, proven software instead of reusing stuff (and building for reuse), is probably the main reason why complex software is so bad (unstable, difficult to get accustomed with and maintain) nowadays.
A musket is also battle-tested, but it was replaced because it is slow, hard to operate, heavy, and in the end if you tried to continue using one, you lost.
M4 is similar. It's just a macro processor. If you try to make it work on a problem that requires more, it will backfire in uncomfortable ways.
Ah, the joy of cherry-picked analogies. Let's replace the musket with an AK-47, how does it look now?
Yes, M4 is a macro processor. A pretty decent one that is well-known and will not surprise you with incompatible changes or newly introduced bugs. Nobody suggested using it for more.
AK-47 had been superseded by AKM after less than 10 years in service. AKM was, in turn, superseded by AK-74. And AK-74 has been superseded by AK-12. The last transition, in particular, was heavily motivated by ergonomics and difficulty of adapting the original to typical modern circumstances of use.
So, I'd say the analogy still holds up pretty well.
I'd rather have someone learn to use M4 when needed than have "some Python" replace M4 in an important project. But that's just me, other people like building sand castles.
> That we tend to implement slightly more aesthetic alternatives to existing, proven software instead of reusing stuff (and building for reuse), is probably the main reason why complex software is so bad (unstable, difficult to get accustomed with and maintain) nowadays.
I don't think it's proven if most of the people that had to work with it complain about the complexity of doing a simple loop. I also don't think it's built for reuse, considering so few people actually use it. If anything, hard to use software like m4 is exactly what pushes everyone to think "I can do better!".
Meh. Most of these classic unix tools aren't particularly good examples of what they do, they're just old enough to have become established. I believe M4 was originally created to simplify the sendmail config file and then eventually pulled out into its own program?
Rather like this writer, I felt the need to make my blog entries markdown with minimum header and footer surrounding them. So I found a one-liner script that renders markdown in the browser. (Anyone using lynx gets the raw markdown instead, but that's probably no less friendly than what lynx could come up with, and also means that you can read it just as well with telnet (or I guess openssl s_client these days) if you don't have a browser around.
Getting flashbacks to the earliest Unix/C book I had (from 1977 with 2nd ed from 1983 or so) already mentioning ratfor and m4. Never occurred to me that these two are related.
Agreed. A ton of people argue for them, but when you ask people to re-design them starting from scratch today, few people arrive to the design of the Bourne shell, for example.
The entire purpose of the comment was to remark that there is one, ovbviously.
I didn't say "window manager", I named a specific piece of software by name, in a particular context.
It's a reasonable assumption that there is a significant connection of some sort between m4 and this window manager, or at least that my intention was to declare as much.
And what more should I have to say in this context than "fvwm uses it too"? Do you actually care about the details of exactly how fvwm uses m4? If you did, then googling "fvwm m4" gives you all that.
FVWM has a complex configuration format and back in the day (maybe even now?) some people would use macro processors to generate the configuration file.
I thought the m4 files simply were the config files. This was something like redhad 3.x or 4.x? It was all m4 files to define the start menu contents and the dock and everything else.
I was new enough at the time that I guess those could have just been a front end to the actual config files without understanding that.
I just remember spending a lot of time fiddling with m4 files to get my fvwm and fvwm95 desktop just the way I wanted it.
It's nothing I chose, it's just the way it was already in the distribution.
edit... out of curiosity I've now googled "fvwm m4" and confirmed my memory. I was sort of mostly or half right. m4 is not strictly "the" config format, but it is a compile-time built-in feature that is generally always used, and old redhat in particular had a confusing setup.
Yes -- same with CPP as well. Fvwm2 allowed you to use the language features of M4/CPP to hold variables and to perform loops on different data, so you could generate complex config files that way, without needing to build that into the core config syntax itself.
However, in fvwm3, I've removed both those modules.
> Pandoc is honestly overkill for just converting markdown to HTML
Strongly disagree. Pandoc is the only solution I could find that outputs static HTML and supports maths. It's a really great solution. And it's pretty fast. I don't know in what way it could be overkill.
Using a hacky M4 script instead of Pandoc is completely insane. Find if you want to do insane things I guess but it's still insane.
The author clearly explained why they weren't able to use pandoc a couple lines before it.
> This arm64 system not only didn’t have pandoc installed, but pandoc wasn’t even in the software repository. I considered compiling it myself, but the language it is written in, Haskel, doesn’t even compile on arm64.
Then they justified it by explaining that their usecase didn't really need pandoc anyway.
He probably needs to cross-compile it. pandoc specifically has a history of not building on ARM boxes directly because it runs out of memory during the build.
And GHC support for ARM in general has been spotty. I know pandoc wouldn't build on Apple Silicon last time I tried to install it, but that was a few months ago.
That's a nice thing about C. It'll compile and run on basically anything.
I'm not gonna question it when someone says they ran into issues compiling something on a different architecture. God knows I've had enough obscure problems with it in the past.