Elegant, but not something you should ever use outside of ad-hoc situations.
I think this is more comprehensible, and is also more robust because it actually specifies the field you're looking for rather than just any line with quotes:
It’s a pity that awk doesn’t support capture groups in line patterns. Then you could get rid of the field separator and make the script even more comprehensible. (You can simulate this with match() in gawk, but then you must know how match() works.)
Personally, I’d probably rather use sed here:
sed -nE 's/^\s*name:\s*"([^"]*)"\s*$/\1/p"
While that regex is more complex, it is also safer and more explicit.
I thought I was pretty good at regex, but I could never have written this one and had to consult both `man grep` and regex101.com.
Explanations for the beginner and intermediate regex and grep user:
`-o`: Only return the match, instead of the entire line
`-P`: use Perl compatible regex
`-m` max-count, Stop reading a file after NUM matching lines.
And now for the regex:
`name:`: find the exact match
`\s*"`: Zero or more spaces leading up to and including an double quote
`\K`: This was the kicker for me. "resets the starting point of the reported match. Any previously consumed characters are no longer included in the final match" - basically tells the regex engine that the characters _before_ `\K` needs to be there in order to form a match, but it should only return the characters _after_ `\K` as the match. This is super handy! Is there a "reverse \K"?
`[^"]+`: One or more characters that are not a double quote. This basically means "Find the line that has a key called "name" and return all the characters after the first double quote and until the last double quote"
"How do I make it print only the captured group with the number, not the whole line?" is a pretty common Stack Overflow question. The "\K" thing gets rid of the "start: " part, but what about " end"? That's were "reverse \K" would come in handy.
The word "often" in the parent comment implies there are times when GNU grep is not better suited than sed with ERE. (I often use flex instead of sed.)
Large repetition counts in the `{n,m}' construct may cause grep to use lots of memory. In addition, certain other obscure regular expressions require exponential time and space, and may cause grep to run out of memory.
Back-references can greatly slow down matching, as they can generate exponentially many matching possibilities that can consume both time and memory to explore. Also, the POSIX specification for back-references is at times unclear. Furthermore, many regular expression implementations have back-reference bugs that can cause programs to return incorrect answers or even crash, and fixing these bugs has often been low-priority: for example, as of 2021 the GNU C library bug database contained back-reference bugs 52, 10844, 11053, 24269 and 25322, with little sign of forthcoming fixes. Luckily, back-references are rarely useful and it should be little trouble to avoid them in practical applications.
Most of my AWK scripts are actually standalone scripts, as with `#!/usr/bin/awk -f` of similar. It's actually a very readable and comfortable little language once you free yourself from the one-liner constraint.
The $0=$2 solution has to find the second field and test it for truthiness for every line, while the regex one only has to do that for those matching the regex, and matching that regex will be fast for most lines (skipping spaces and then testing for a fixed string ‘name’ is easy)
Regardless, the regex solution is more robust and maintainable.
Depending on how this get used, I might add code to detect the ‘appVersion’ start and end lines, set/clear a flag there and only match the ‘name’ lines when that flag is set to make it even more robust (who knows what other lines might contain ‘name’ now or in the future?)
Awk is such a weird tool--it's powerful and so few people know how to leverage it.
Yesterday, someone in chat wanted to extract special comments from their source code and turn them into a script for GDB to run. That way they could set a break point like this:
void func(void) {
//d break
}
They had a working script, but it was slow, and I felt like most of the heavy lifting could be done with a short Awk command:
This one command find all of those special comments in all of your source files. For example, it might print out something like:
source/main.c:105 break
source/lib.c:23 break
The idea of using //d[[:space:]]+ as the field separator was not obvious, like many Awk tricks are to people who don't use Awk often (that includes me).
(One of the other cases I've heard for using Awk is for deploying scripts in environments where you're not permitted to install new programs or do shell scripting, but somehow an Awk script is excepted from the rules.)
Unfortunately, the universal POSIX standard of awk only supports single-character, non-regular expression field separators (https://pubs.opengroup.org/onlinepubs/9699919799/utilities/a...). It's arguable whether one should write POSIX-compliant awk or not (similar arguments apply for shell scripting).
When feasible, I try to write POSIX-compliant awk, so the script could have been written as:
- Delete the part of the line up to and including //d,
- Print the file, line number, and the rest of the line.
Maybe it's familiarity with regular expressions? If you're not familiar with regular expressions, that Awk is gonna look a bit funny. The regular expressions look a bit messy just because they have to match literal slashes, so you get /\/\/
There is no use of funny features or clever tricks in the code, it's just kind of straightforward, mindless code that does exactly what it says it does. It's definitely less clever than the Awk invocation that I wrote (which is a good thing).
If I see that regex in code somewhere, I'd have to stop and break it down the way you did to understand what it's doing, and I've spent a fair bit of time with regex. I think reading it and understanding what it does in one go would require far more than mere familiarity. (and that's not including the awk-isms like FNR etc.)
“That regex” makes me suspect that you think that the entire script is a regex. It’s not. It’s an Awk script, with two regexes in it.
First, know that Awk goes line by line. The script is implicitly executed for each line in the input. That’s just the entire thing Awk does, normally—if you want to process files line by line, and your needs are simple, well, Awk fills in the gaps where stuff like “cut” fall short (and I can never remember how to use cut, so I just use Awk anyway).
Second, know that “if” is implicit in Awk. You don’t write this:
if (condition) { code }
You write this instead:
condition { code }
This is like how Sed works, or Vim, except you get braces and the syntax is a bit easier to read.
The code block contains two statements: one function call (gsub) and then print.
So the first regular expression is just “//d ”, with some escaping for the slashes. The second regular expression is “.*//d ”. I do think that someone with basic familiarity with regexes should have no problem understanding these.
Cool! Now do this with every line of awk you're asking someone else to maintain, give up, and then write it in a better, more explicit language instead.
Feels like I’m on Usenet again when I read comments like this.
Awk is mostly nice for one-liners and is something you can just write into a command-line ad-hoc. It’s good at that. I could write the same thing as a Python script but it would take longer, and I would need to know that Python is installed on the system—Awk has a larger install base and is found on “minimal” installs.
If you hate Awk, and think it’s stupid, don’t use it. Seems like a big waste of time trying to argue with people who like Awk. That’s the kind of discussion I remember from my experience on Usenet in the 90s, and it seems like some people haven’t learned to move on.
it's powerful and so few people know how to leverage it
Because otherwise it is useless. It has the same fate as AHK scripting and similar little languages. The language may be okay for its task, but if you cannot or do not use it for other things (unlike e.g. perl) at least from time to time, chances that you will learn and remember it are low. People know sed because regexps are everywhere. People use perl instead of awk, because they have a muscle memory for it. They may know [, find and glob for their relative generic-ness. They ignore awk and ahk because these are too niche to pay enough attention to. You either find a snippet or just move on.
If you are not constrained by a single line in a script, it’s easier to feed a heredoc into an interpreter of choice.
Yes, ahk is a great framework for its purpose. It’s hard to repeat it from scratch. But it would hugely benefit from literally any mainstream language around it. Imagine how many plugins simply wouldn’t exist or delivered quickly if e.g. vscode invented its own arcane vscodescript instead of js/ts. Or how richer desktop automation could be if ahk could use pip, npm, IDEs and interop other “junction” tech seamlessly.
The question I prefer to ask is “how much functionality can I comprehend in a given amount of time” rather than “how many lines of code can I comprehend in a given amount of time”.
(Honest question) what do you feel that your comment added to the one above it?
Are you suggesting that "dumb verbose code" might not be legible (I suppose that's technically possible, but seems unlikely to happen by accident)?
Or are you implying that Perl consists of "hieroglyphics" and so is not a suitable language for writing legible code? This, I think, would miss the point - deepsun was saying that, in both Perl and in awk, readers prefer legible code over cleverness - to claim that Perl cannot be legible at _all_ requires a little more justification, and would probably be disputed on the grounds that familiarity with a language's conventions is often a prerequisite for legibility.
I think it’s additive. We like to feel smart and can over complicate things. I much prefer a boring non-trendy approach that’s easy to maintain over new hotness every time.
Yes, I was basically saying that terse code is often unreadable due to the terseness coming from what amounts to a weird compression algorithm that can satisfy a compiler but buries information for humans in weird syntax instead of something resembling English.
Awk, like vim, is a tool I love dearly that I absolutely would never recommend someone else learn. It’s like a form of mental illness but it’s so lodged in my brain and I’ll give it up when they put me in the grave.
Depends on the person and on his specifics. A CS student should absolutely take time to learn to use these vim, awk, gdb etc. For a self-learned dev who is already working and already have his habits, i don't think this is worth his time.
Why is it worth the time of a CS student and not worth the time for a self-learned developer?
Feels like the reasoning should be something like; either learning it makes you more efficient, or it doesn't make you more efficient. If it does, learn it, if it doesn't, don't, regardless of your background.
Learning this will make you more efficient, always. Using gdb (pdb for me rn, but it's the same) or vim especially, but you have to consider your working week, the habits you forged, and like a sibling said, the opportunity cost.
Let's be honest, CS students do have time to try thing, do vim tutor or regex golf (yeah, add regex to the list too) and other stuff like that. Once you're working, you loose some agency. And gain some.
Recently at a daily, i proposed to help write my coworker's regex. He is a fine dev/ops guy, but self taught (ex electronics guy) and miss some basics that aren't useful 99% of the time. He could've written his regex without help, but this is typically the case where getting more efficient isn't really worth the cost once you're working.
> For a self-learned dev who is already working and already have his habits
this implies that the self-learned dev has habits that are just as efficient as the unix toolset being recommended for the CS student.
But you dont know if that's actually true - it might be for some people, but not for others. It's a skill for someone to have, to find out whether their current toolkit is not good, and that a better one exists.
Off-topic, but I _love_ the "sidenote" format for footnotes. I've been meaning to implement that in my own blog for a while, now. I'll check out the source for inspiration.
I think awk and sed are great, and it's great to eschew pipes when a powerful component that's ready in the pipelibe, like sed, awk, or grep, can pull double duty, but do think that one should favor using `grep -e` with a positive look behind in a case like this. I can just picture an innocuous string being added anywhere above the version string and throwing this one-liner off, when a regex could have pulled the version string out from an arbitrary position within the file and would hold up much more robustly; You don't want to impose opaque constraints on future edits even if you're the only one who will be editing the file, it's just too easy to forget and trip over later.
I'm whatever the opposite of a code golf extremist is.
E.g. I hate the so called "useless use of cat." I uselessly use cat all the dang time; visualizing the pipeline is infinity more useful then, what, "elegance?" Who cares?
Same here. I use `cat` as a tool to read data from the filesystem into a pipeline, regardless if that's the initial intention of the tool. I don't want the input filename being to the right of the first manipulation command in a pipeline that's supposed to read left to right!
Well, its only intuitive because you already memorized what all the options and command lines do. I for one don't even remember how to use head, I just
sed 15q
instead of head -n 15, s/// is clearer than a cut sometimes. if you're starting unix, learning sed/awk is enough to do everything.
As someone who teaches this stuff, I don't think so? Down the line you can do memorization of quicker ways, but I think "memorization" and "intuition" are opposites here.
As in, starting the pipeline with "cat" every time makes intuitive sense more than taking the time to figure out which command to start with?
Sure, is egrep a|egrep b| egrep c more intuitive than awk '/a/ && /b/ && /c/'. sure, until you need to have more control over the pipe, in which case, you already forked to awk. so why not go all the way and keep the operation as an awk expression? The great thing about it is, the more you use a tool, the faster you are at using it.
I think good awk scripts lie somewhere between "This is atrociously overfit to this file" and "this is so general you should have done it in python/perl/etc".
Of course there are legitimate uses for both super-specific and super-general awk scripts, but finding the right compromise is what makes a good awk script. You want your script to be concise yet robust to changes in the input files. Also, readability and maintainability are really important if you plan to add it to an important script.
I understand the sentiment. As a software engineer I refrain from clever code especially if somebody else has to come and maintain it.
In this case, once you realize how the defaults stack, whoever came up with the one liner indeed has come up with something clever. Also the exercise in understanding the defaults cements the understanding of awk and as others have pointed out enables you to write much cleaner awk.
Too much text to explain what could have been explained clearly in 3-4 sentences. Making simple things complicated and then writing a long essay with dramatic phrases is a disservice to awk.
* https://backreference.org/2010/02/10/idiomatic-awk/ — how to write more idiomatic (and usually shorter and more efficient) awk programs
* https://learnbyexample.github.io/learn_gnuawk/preface.html — my ebook on GNU awk one-liners, plenty of examples and exercises
* https://www.grymoire.com/Unix/Awk.html — covers information about different `awk` versions as well
* https://earthly.dev/blog/awk-examples/ — start with short Awk one-liners and build towards a simple program to process book reviews