Hacker News new | past | comments | ask | show | jobs | submit login
Notes on the M4 Macro Language (2008) (mbreen.com)
92 points by luu on April 3, 2020 | hide | past | favorite | 59 comments



I've become a huge fan of M4. At work, we needed to create a container that would create templated configuration files based on environment variables. We were considering using Mustache or other existing template systems, but most of them are fairly closely coupled to a programming language. We didn't want to tie container creation to a language runtime, and M4 seemed to be the most mature, well-documented (albeit crusty) language agnostic macro system.

M4 was easy to use, and unless you do something crazy, fairly easy to read. I ended up moving most of the actual envar reading/processing logic to a single base m4 file, and so the actual templates would import all necessary variables from the base m4 file and largely keep the original structure.

I wish there was a more "modern" (read: ergonomic) descendant of M4, but M4 is easy enough to understand, flexible, and extensively (if not well) documented enough that it's easy to use, stable, and fast enough for most of my use cases.


M4 has some well-known uses like Sendmail and GNU Autoconf, but I don’t think this can be counted in M4’s favor: For operating in a standard Unix-like environment, there is really no good alternative to m4 for generic text macro expansion. You can try to force cpp – the C preprocessor – into this, but it is not suited to not being part of a C compiler pipeline. For better or for worse, m4 is it in the world of Unix and shell tools, even though its syntax is quite annoying to use.

Of course, if you operate within a language ecosystem, you can probably find a nice templating system there. Python, for instance, has Jinja2. Or, if your needs are simple, you can use something more basic, like sed(1).


I've looked at m4 before, and want to love it, but it just seems a bit too clumsy to use. And unfortunately, IIRC, unlike sed it's not ubiquitously available.

Not really crazy about Jinja2 either, but at this point it seems like it has a lot more mindshare and maintenance.


M4 as part of posix should be as widely available as sed, no?


In principle, yes. In practice, it seems like it's often not installed on the random hosts I encounter. This isn't a fatal flaw, but it is a small advantage in favor of sed and awk and probably even perl, which are essentially ubiquitous these days.


Mustache / Handlebars is pretty widely available throughout languages these days too and is pretty basic. Doesn't help you at the default posix level, but there are even command line versions IIRC


Last time I used M4 it was for a multi language system where I needed to substitute in PHP, shell scripts, apache conf files, etc.


Plain shell is pretty good for templating, and it's the most available language on Unix. You have here docs for long literals, and you can choose to use $var expansion or not with EOF or 'EOF' (yes it's a confusing syntax).

You have loops and conditionals, and functions that take parameters to modularize it (e.g. a function for each table row).

Sequential composition "concatenates" stdout. At the end you redirect it all like this, which is inconvenient in other languages:

    top-level-function > out.html
----

For example, I generate this page with shell:

http://www.oilshell.org/releases.html

And it's separated into 3-4 functions here:

https://github.com/oilshell/oil/blob/master/devtools/release...

----

Shell falls down when you have to generate complex logic as with tables [1], but m4 also falls down there.

I actually experimented with many different methods in order to help me design the Oil language.

This is generated with awk, and it has some advantages over shell:

https://www.oilshell.org/release/0.8.pre3/test/spec.wwz/osh....

These two are generated with Python:

https://www.oilshell.org/release/0.8.pre3/test/spec.wwz/alia...

https://www.oilshell.org/release/0.8.pre3/test/wild.wwz/

So I hope Oil will do better than shell, awk, and Python.

-----

Shell needs to be better for escaping. Template languages like Jinja do better there. I put m4 in the "70's-style macro processing bucket" which is generally sloppy about escaping.

I use Python for escaping here, but I do it with O(1) processes and not one process for every tiny fragment that needs to be escaped!

Git Log in HTML: A Harder Problem and A Safe Solution http://www.oilshell.org/blog/2017/09/29.html


M4 is so impotent and crippling difficult to use, that it led me to create my own dumb templating engine[0] because there is no standard POSIX tool that will let you, simply and in the dumbest way possible, substitute strings within a file.

If you try using awk or sed, you will need to figure out a magical way to escape whatever arbitrary value you are trying to replace, so you don't break sed's regex or awk's syntax.

M4 was one of the last things I tried, and it was an utter disaster. If any of M4's magical characters collides with the syntax of the file you're trying to template: tough luck. And there is no way to escape your file for M4 either, it's syntax doesn't support that.

I eventually discovered that Bash 4.4+ has some built-in functions for "dumb" string substitution[1], which is exactly what's missing from the default POSIX tools.

Since then I wrote this little wrapper around that functionality, as a simple, single-file, dependency-free Bash script. If you find something like that useful, it's on the link below:

[0] https://github.com/luizberti/templ

[1] https://www.tldp.org/LDP/abs/html/string-manipulation.html


`envsubst` is not POSIX, but my preferred tool for such things: https://linux.die.net/man/1/envsubst

it's part of `gettext`: https://www.gnu.org/software/gettext/manual/html_node/envsub...


There is likely already a simple templating program installed on your linux OS: envsubst. It is included as part of the gettext-base package on Ubuntu and installed by default I believe.

https://www.gnu.org/software/gettext/manual/html_node/envsub...

The templating language looks similar to capability to that provided by templ.


Would perl or ruby not suit your requirements?


Ruby most definitely wouldn't, it doesn't come pre-installed on almost any AMI/server/Docker Image/etc.

As for perl, it does come "with the system" more often than Ruby, but also not as much as bash.

Even if it did, I reckon bash is something that nowadays people are more accustomed to than perl, and would consider it more of a no brainer.


I always remember m4 encounters from autoconf struggles of building things from source a long time ago (slackware 6).

Back when I never cared about m4, Today, I find it interesting to understand what problem its trying to solve. I dont feel like its practical anymore but I think it is important to not discard use of old tools and understand them.


https://en.wikipedia.org/wiki/M4_(computer_language)#History

I see it as a more powerful version of CPP. It might be somewhat based on it. CPP seems to have been developed in 1972-3[1], while M4 was developed in 1977.

Today, I think we're using template systems like Mustache, Handlebars, ERB, Jinja, etc. where otherwise macro systems would be used. The difference between macro systems / preprocessors and templates is that the latter comes with syntax (like <%= %> for ERB) that always makes it clear what's handled by the template system. With M4 and CPP, it's implicit and instead you have syntax (`' or []) to exclude stuff from being handled by the macro system.

I think that change is because, before, macro systems were meant to be used mainly with programming languages to avoid repetition. Then, programming languages became concise and more flexible, so they didn't need to be handled by another language like C has CPP. Our need changed to having to avoid repetition with data files like HTML.

Perhaps having implicit expansion when used with an underlying programming language is not so much an issue, or at least it could be handled with a convention like how CPP variables and functions are normally capitalized. With data files, I suppose being able to differentiate between what's HTML and what's ERB was probably more important.

[1] https://softwareengineering.stackexchange.com/questions/3093...


The pattern I’ve fallen into with m4 is to have a preamble that sets up whatever definitions I’m going to need, redefine the quotes “backwards”, and then place a copy of the new close quote on its own line to start the main body of the file. It then acts very similarly to how you describe templating languages, where you mark the parts that should be parsed instead of the parts that shouldn’t.


Oh. Oh, that's just brilliant. (The example in the other subthread makes it clear.)


I'm intrigued, could you give a small example please?


Not right now, unfortunately; my computer is at the office and I don’t use m4 enough to give an example from memory. I’ll try to remember next time I go in.


I imagine it's something like this?

  $ m4 << EOF
  changequote(],[)dnl
  define(foo,bar)dnl
  ]Lorem ipsum

  foo
  [foo]

  Lorem ipsum[
  EOF
  Lorem ipsum

  foo
  bar

  Lorem ipsum


That looks about right; I’m usually outputting formats that don’t care about blank lines and so skip the “dnl”s and put the start and end quotes on their own lines for readability.


Heh. Never seen that before. I'll have to have a play. Thanks both.



Back in 2002 I wrote a documentation templating system in m4. It provided a lot of the same tags as markdown, and generated HTML. In retrospect I have no idea why I wrote it in m4 - I think it was a "because I can" kind of thing. I also had a system for ripping CDs written in GNU make, so let's say I had an odd taste in scripting tools.


I'm curious if anyone is still using this tool today. Back when I used it (~8-ish years ago) I found it to be a godsend.


It's used by autotools, and about 15 years ago I used it to generate BIND (dns server) configs, and sendmail.conf files, too. But I'd rather not use if I don't have to :) m4 is still POSIX, though, and maybe someone can elaborate on m4 and tell us about it's strength in times of JSON and (ya|to)ml configs, which I also don't like, perhaps because they're misnamed IMO as "markup languages".


For JSON, see jq (https://stedolan.github.io/jq).


This too is one of the ways I was using it back then. Working in a small CMS shop we were responsible for hosting Drupal sites with custom DNS, and all that. We used m4 to build out the BIND configs, Drupal config files, logging configs, and Apache configs.


I wrote a large package of macros for a niche assembler using M4. It has some sophisticated things like a code generator for non-infix expressions. M4 was a blessing and a curse. It is very powerful and saved me from implementing my own macro language but debugging is not for the faint of heart once you get into anything complicated.

https://github.com/kevinpt/opbasm/blob/master/opbasm/picobla...

Michael Breen's notes are indispensable documentation on how to make effective use of the language.


I'm a fan of M4, and I don't know what these are doing but it surely looks too complex. You have defines within defines and OMG just an extract

  .. (just a snip follows)
  `define(`_'''$`'1```_LENGTH',strlenc(''$`'2``))'dnl
  `; "''$`'2``"'
  `ifdef(`PB3', ''$`'1```: calltable($4, $1, estr('''$`'2```))dnl
  return',dnl
  `table '''$`'1```#, [dec2pbhex(cstr('''$`'2```))]
  '''$`'1```: loadaddr($2, $3, _'''$`'1```_STR)
  jump __`'$5`'_handler
  _'''$`'1```_STR: load&return $1, '''$`'1```#')'
  )')'
I have to ask if this is right, or whether a parser would have been a whole lot less work. Credit for your skillz, but that looks like so much pain.


These are macros that generate user configurable codegen macros. I couldn't hard code register assignments so everything dynamically generated is prepared with these monstrosities. They also can't have fixed names since some need to be invoked multiple times with different parameters.

FWIW I didn't intend for things to go this far. I just started with some simple macros and it snowballed.


I work for a telecom where we've used it to generate network configs for a long time. We just got rid of it because it's impossible to hire people who know it and nobody (new hire or existing people) want to learn it either


Some years ago at a former employer I used to for generating some Java 5 code. I needed (as I hazily recall) an interface of accessors, immutable implementation, and a factory for the same couple dozen members. It was a lot of boilerplate and the list was changing a fair amount over the course of development. It worked great, and I got it plugged into the ant build without too much difficulty.

I'm told the person who took over that codebase was equal parts perplexed and horrified by it. They ripped out the m4 code generation and just put the generated files directly in the source tree. As the codebase had stabilized, the purpose that m4 served was basically gone.

So, not still in use, but if I found myself in the same situation today, I'd explore other options and consider doing the same thing. Interface default methods might cover some of the need at this point, but it's so far back, I'm hazy on the details.

The nice thing about doing this sort of code generation with m4 rather than the C preprocessor is that m4 preserves formatting (like line breaks) a lot better. Even if you're not using any of the other functionality of m4 (and there's a lot), that alone makes figuring out where things went wrong easier.


I do, from time to time. It's nice to have a preprocessor for normal text files. Yet, most often I find it more practical to use sed or awk for my needs.


I dusted it off this semester to help with some complicated reports I needed to make. I needed to include lots of graphs from simulation runs with various parameters. I encoded the parameter values in the graphs’ filenames, and used m4 to write the corresponding gnuplot control file.

After tying everything together with make, I could include plots for any parameter as I was writing the report and have the relevant data generated as necessary. All without re-running simulations that had already produced results.


I use it as a simple static site generator and for generating C++. The thing I dislike most is that it uses ` and ' for quoting and there is no easy way of escaping quotes.


You can change the quote chars. This is why autotools use [].


Used it for some C code, because CPP was not able to accomplish what I wanted. m4 was more powerful and I was able to finish my task.

Thankfully I never used it in context of sendmail or autotools, so no reason to hate it for me, I guess.


not today but I used m4 in the late 80s to generate appropriate makefiles from a single source to allow me to compile my programs on HP-ux, Apollo Aegis, SunOS, AIX, etc. Happy days.


Are there in-place macro tools that could be similar to this?

Examples: - italics in a live markdown editor usually hides the asterisks after you've closed the pair, but can get them back if you backspace with your cursor at the 's'. - [[]] style links in similar live editors hide/show content as the syntax has been designed to do.

These examples do it for me, the author, not just in publication.

Other examples include CAD software array-style duplication as a single example. Draw one thing, give a command like "copy 10 times" and then move the piece and see 10 copies laid out.

I can imagine using something like that `define` example in m4 and wanting the instruction to propagate throughout my text file, yet retain the command as a sort of undo/toggle option. Again, all for me the author as I think through my work in this editor, publication isn't really the point in these thoughts.

The workflow for applying formulas to org mode tables is fantastic. I'm still an absolute novice, but I can keep an array of TBLFM formulas[1] below my table which I write and execute at will (C-c C-c or ,,) and can retain as a sort of history of the table state. There seem to be an endless variety of hooks I can tap into for these kinds of table editing sessions.

I'm thinking about this in the context of the process of doing research and writing. When my thoughts are not concrete, I do not have a formal document I'm writing but I'm learning and thinking and connecting information. I am in the process of auditing a whole load of these tools and I want a tool that is the superset of all functionality I see, not sacrificing the awesome transclusion of TiddlyWiki or internal links in org mode for the drawing abilities of OneNote.

I think what I want is an exceptionally powerful "viewport" for lack of a better term. I take great inspiration from ZUIs and wonder how to apply the fantastic text based tools I know to a semantically zooming canvas. I don't like directories, I do like juxtaposition of disparate thoughts and projects. Can I have an AutoCAD/OneNote/orgmode? I want one. m4 seems inspires me towards this goal, even though I realize that's not exactly it's intention.

[1]: https://orgmode.org/manual/Durations-and-time-values.html#Du...


Tbh I cant figure out what you're asking for


As an example, Dropbox Paper supports a number of keyboard-text[1] to formatting capabilities. - The markdown goodies (bold, italics, line break, headings, quotes, etc)

- Dropbox Paper specific items like +LinkToDocument, @Person, #tag

- Digital Paper functionality like stylized tables /note red, creates a red box and puts your cursor inside it

- literally just two commandline tools /date, /time

These are all much more convenient than clicking GUI buttons to get these formatting options yet incredibly limited. There are on the order of tens of these capabilities in any similar application. Notion, Slack, TiddlyWiki, etc.

I can think up hundreds of keyboard-text -> computer-things. Sure they are basic and understood processes, but unless they are built by a developer for a walled garden application you can't really have them. In general, this is one of the reasons I primarily use vim/emacs and linux, so I can build my own interface for my computer. But I'm interested in bringing that textgrid customization into a canvas application that supports more media types than text. vim + repl + powerpoint. Keyboard driven, mouse driven, stylus driven and as powerful as a computer can be.

- I can highlight 3 paragraphs and type /column to make a three column layout.

- I can type /tabs to create tabs in my document ala tiddlywiki tabs.

- I can link statusbar like information into a document with /uptime and document how much time this document has been open.

I would like these capabilities in a shared document system to be able to work collaboratively on a higher level than text, and with more programmatic options than the tens that these applications provide.

I think I can't express this well and I'm really at the beginning of thinking about it, thanks for taking the time to read it. Maybe there's nothing here and this is just a manifestation of my frustration of using a computer.

[1]: things I can type on my ~80 or ~100 key keyboard.


Maybe OneNote would be a place to look.


I used OneNote every day for a month. It's a toy application not meant to be used seriously. It has no navigational tools beyond its hierarchy of files and search. Import a 78 page pdf to read and annotate? You can't set bookmarks or even jump to the bottom.

Searching the text of pdfs just failed in 3/3 tests I tried going through my old notes to find a document.


You can use hyperlinks as a navigational tool as well. You can get a link to any paragraph on any page and paste anywhere else. You can also use tags and "tag summary pages."


I'll have to look more into this, thanks for the tip. I have found it incredibly difficult to figure out how OneNote was intended to be used. The in-app tutorials are basic in the extreme (describing only the visible UI) and I couldn't seem to google-fu it well. There are endless YouTube videos and reddit fluff posts but even the technical looking channels just have a 10 min video where the main content consists of a stylized "Science is cool" header with some vectors or bio notes.

As I said I used it in my full class load for a month then entirely gave up on it. Too many papers and too many large documents of notes, I have much more reusability and ease of access and searching my pile of md documents next to a pile of PDFs (I write `zathura path/to/paper`) at the top of each section in my markdown doc that I edit in vim, then when I want it open I put my cursor on it and hit `gx`. This beat out any OneNote workflow I could figure out in a month of daily use.


Ok good to know



I wonder if the "quoting" problem could be fixed by making m4 recognize Lua-style longstrings, which never need escape characters. Some examples:

http://lua-users.org/wiki/StringsTutorial

I considered using longstrings to make a "better CSV" once but then I realized that quoting is only one of the many problems with CSV.


I once developed a tiny single-file bash script to do dumb string substitution for arbitrary files and arbitrary strings, and I used this exact Lua mechanic to avoid escaping issues if the delimiters I chose collide with the syntax. It's a pretty great mechanic.

My script is here if you wanna check it out https://github.com/luizberti/templ


It's not clear to me what you mean by "longstrings" from that page. Are you referring to the double-bracket quoted style?


Yes. A Lua-style longstring is delimited by [===[ and ]===] where the number of = signs can be arbitrary (including zero); the end must have the same number of = signs as the beginning. Any string can be encoded inside of a longstring without escaping, simply by using long enough delimiters. There are a number of advantages to this notation: it's more readable (usually), it generally results in shorter code files, and longquoted substrings can be directly addressed without translation (though there may be other reasons not to pass that pointer around).


I've seen M4 used to implement instruction decoding logic (for the POWER instruction set) in a design and to describe a bus interconnect in system verilog. Still not sure if it was a good or bad idea...



Ah my lovingly hand-crafted fvwm config & menu files on that 386.


dnl This is a comment: M4 is insane.


(2008)


In this thread the top two comments couldn't be more opposite:

PRO: "I've become a huge fan of M4 ... M4 was easy to use, and unless you do something crazy, fairly easy to read. "

CON: "M4 is so impotent and crippling difficult to use, that it led me to create my own dumb templating engine[0]"


Deprecate it with fire.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: