First of all I'm not a distro maintainer. I also doubt that people would use awk...

wahern · on Jan 21, 2020

> I also doubt that people would use awk for seriously space constrained environments. And distros ship both awk and python anyway.

Python is absolutely not available everywhere one can find Awk. I've never seen a system with Python but not Awk, but have seen many systems with Awk but not Python (excluding the BSDs, where Python is never in base, anyhow).

Actually, not many years ago I used to claim that I never saw a Linux system with Bash that lacked Perl, but had seen systems with Perl that lacked Bash. (And forget about Python.) This was because most embedded distros use an Ash derivative, often because they used BusyBox for core utilities or a simple Debian install. Perl might not have been default installed, either, but invariably got pulled in as a core dependency for anything sophisticated. Anyhow, the upshot was that you'd be more portable, even within the realm of Linux, with a Perl script than a Bash-reliant shell script. Times have changed, but only in roughly the past 5 years or so. (Nonetheless, IME Perl is still slightly more reliable than Python, but variance is greater, which I guess is a consequence of Docker.)

One thing to keep in mind regarding utility performance is locale support. Most shell utilities rely on libc for locale support, such as I/O translation. Last time I measured, circa 2015, setting LC_ALL=C resulted in significantly improved (2x or better, I forget but am being conservative) standard I/O throughput on glibc systems.[1] I never investigated the reasons. glibc's locale code is a nightmare[2], and that's more than enough explanation for me.

Heavy scripting languages like Perl, Python, Ruby, etc, do most of their locale work internally and, apparently, more efficiently. If you don't care about locale, or are just curious, then set LC_ALL=C in the environment and test again. I set LC_ALL=C in the preamble of all my shell scripts. It makes them faster and, more importantly, has sidestepped countless bugs and gotchas.

For the things I do, and I imagine for the vast majority of things people write shell scripts for, you don't need locale support, or even UTF-8 support. And even if you do care, the rules for how UTF-8 changes the semantics of the environment are complex enough that it's preferable to refactor things so you don't have to care, or can isolate the parts that need to care to a few utility invocations. In practice, system locale work has gone hand-in-hand with making libc and shell utilities 8-bit clean in the C/POSIX locale, which is what most people care about even when they care about locale.

[1] The consequence was that my hexdump implementation, http://25thandclement.com/~william/projects/hexdump.c.html, was significantly faster than the wrapper typically available on Linux systems. My implementation did the transformations from a tiny, non-JIT'd virtual machine, while the wrapper, which only supports a small subset of options, did the transformation in pure C code. My code was still faster even compared to LC_ALL=C, which implied glibc's locale architecture has non-negligible costs.

[2] To be fair, it's a nightmare partly because they've had strong locale support for many years, and the implementation has been mostly backward compatible. At least, "strong" and "backward compatible" relative to the BSDs. Solaris is arguably better on both accounts, though I've never looked at their source code. Solaris' implementation was fast, whatever it was doing. musl libc has the benefit of starting last, so they only support the C and UTF-8 locales, and in most places in libc UTF-8 support simply means being 8-bit clean, so zero or perhaps even negative cost.

kragen · on Jan 22, 2020

There was a long period of time where it was easy to find a non-Linux Unix with Perl installed but not Bash: SunOS, Solaris, IRIX, etc., admins would typically install Perl pretty early on, while Bash was more niche. Like, maybe 1990 to 2000. Now we're getting into an era where lots of Unix boxes run MacOS, and although they have Bash, it's a version of only archaeological interest. But they do have Perl.