Awk is such a weird tool--it's powerful and so few people know how to leverage it.
Yesterday, someone in chat wanted to extract special comments from their source code and turn them into a script for GDB to run. That way they could set a break point like this:
void func(void) {
//d break
}
They had a working script, but it was slow, and I felt like most of the heavy lifting could be done with a short Awk command:
This one command find all of those special comments in all of your source files. For example, it might print out something like:
source/main.c:105 break
source/lib.c:23 break
The idea of using //d[[:space:]]+ as the field separator was not obvious, like many Awk tricks are to people who don't use Awk often (that includes me).
(One of the other cases I've heard for using Awk is for deploying scripts in environments where you're not permitted to install new programs or do shell scripting, but somehow an Awk script is excepted from the rules.)
Unfortunately, the universal POSIX standard of awk only supports single-character, non-regular expression field separators (https://pubs.opengroup.org/onlinepubs/9699919799/utilities/a...). It's arguable whether one should write POSIX-compliant awk or not (similar arguments apply for shell scripting).
When feasible, I try to write POSIX-compliant awk, so the script could have been written as:
- Delete the part of the line up to and including //d,
- Print the file, line number, and the rest of the line.
Maybe it's familiarity with regular expressions? If you're not familiar with regular expressions, that Awk is gonna look a bit funny. The regular expressions look a bit messy just because they have to match literal slashes, so you get /\/\/
There is no use of funny features or clever tricks in the code, it's just kind of straightforward, mindless code that does exactly what it says it does. It's definitely less clever than the Awk invocation that I wrote (which is a good thing).
If I see that regex in code somewhere, I'd have to stop and break it down the way you did to understand what it's doing, and I've spent a fair bit of time with regex. I think reading it and understanding what it does in one go would require far more than mere familiarity. (and that's not including the awk-isms like FNR etc.)
“That regex” makes me suspect that you think that the entire script is a regex. It’s not. It’s an Awk script, with two regexes in it.
First, know that Awk goes line by line. The script is implicitly executed for each line in the input. That’s just the entire thing Awk does, normally—if you want to process files line by line, and your needs are simple, well, Awk fills in the gaps where stuff like “cut” fall short (and I can never remember how to use cut, so I just use Awk anyway).
Second, know that “if” is implicit in Awk. You don’t write this:
if (condition) { code }
You write this instead:
condition { code }
This is like how Sed works, or Vim, except you get braces and the syntax is a bit easier to read.
The code block contains two statements: one function call (gsub) and then print.
So the first regular expression is just “//d ”, with some escaping for the slashes. The second regular expression is “.*//d ”. I do think that someone with basic familiarity with regexes should have no problem understanding these.
Cool! Now do this with every line of awk you're asking someone else to maintain, give up, and then write it in a better, more explicit language instead.
Feels like I’m on Usenet again when I read comments like this.
Awk is mostly nice for one-liners and is something you can just write into a command-line ad-hoc. It’s good at that. I could write the same thing as a Python script but it would take longer, and I would need to know that Python is installed on the system—Awk has a larger install base and is found on “minimal” installs.
If you hate Awk, and think it’s stupid, don’t use it. Seems like a big waste of time trying to argue with people who like Awk. That’s the kind of discussion I remember from my experience on Usenet in the 90s, and it seems like some people haven’t learned to move on.
it's powerful and so few people know how to leverage it
Because otherwise it is useless. It has the same fate as AHK scripting and similar little languages. The language may be okay for its task, but if you cannot or do not use it for other things (unlike e.g. perl) at least from time to time, chances that you will learn and remember it are low. People know sed because regexps are everywhere. People use perl instead of awk, because they have a muscle memory for it. They may know [, find and glob for their relative generic-ness. They ignore awk and ahk because these are too niche to pay enough attention to. You either find a snippet or just move on.
If you are not constrained by a single line in a script, it’s easier to feed a heredoc into an interpreter of choice.
Yes, ahk is a great framework for its purpose. It’s hard to repeat it from scratch. But it would hugely benefit from literally any mainstream language around it. Imagine how many plugins simply wouldn’t exist or delivered quickly if e.g. vscode invented its own arcane vscodescript instead of js/ts. Or how richer desktop automation could be if ahk could use pip, npm, IDEs and interop other “junction” tech seamlessly.
The question I prefer to ask is “how much functionality can I comprehend in a given amount of time” rather than “how many lines of code can I comprehend in a given amount of time”.
(Honest question) what do you feel that your comment added to the one above it?
Are you suggesting that "dumb verbose code" might not be legible (I suppose that's technically possible, but seems unlikely to happen by accident)?
Or are you implying that Perl consists of "hieroglyphics" and so is not a suitable language for writing legible code? This, I think, would miss the point - deepsun was saying that, in both Perl and in awk, readers prefer legible code over cleverness - to claim that Perl cannot be legible at _all_ requires a little more justification, and would probably be disputed on the grounds that familiarity with a language's conventions is often a prerequisite for legibility.
I think it’s additive. We like to feel smart and can over complicate things. I much prefer a boring non-trendy approach that’s easy to maintain over new hotness every time.
Yes, I was basically saying that terse code is often unreadable due to the terseness coming from what amounts to a weird compression algorithm that can satisfy a compiler but buries information for humans in weird syntax instead of something resembling English.
Yesterday, someone in chat wanted to extract special comments from their source code and turn them into a script for GDB to run. That way they could set a break point like this:
They had a working script, but it was slow, and I felt like most of the heavy lifting could be done with a short Awk command: This one command find all of those special comments in all of your source files. For example, it might print out something like: The idea of using //d[[:space:]]+ as the field separator was not obvious, like many Awk tricks are to people who don't use Awk often (that includes me).(One of the other cases I've heard for using Awk is for deploying scripts in environments where you're not permitted to install new programs or do shell scripting, but somehow an Awk script is excepted from the rules.)