I use awk because there's an almost 100% chance that it's going to be installed on any unix system I can ssh into.
I use awk because I like to visually refine my output incrementally. By combining awk with multiple other basic unix commands and pipes, I can get the data that I want out of the data I have. I'm not writing unit tests or perfect code, I'm using rough tools to do a quick one-off job.
For instance, "mail server x is getting '81126 delayed delivery' from google messages in the logs, find out who is sending those messages".
# get all the lines with the 81126 message. Get the queue IDs, exclude duplicates, save them in a file.
Each of those 2 one-liners was built up pipe-by-pipe, looking at the output, finding what I needed. It's not pretty, it's not elegant, but it works. I'm sure there's a million ways that a thousand different languages could do this more elegantly, but it's what I know, and it works for me.
I know you’re not asking for awk protips but you can prefix the block with a match condition for processing.
... | grep foo | awk ‘{print $6}’ | ...
becomes
... | awk ‘/foo/{print $6}’ | ...
If you start working this into your awk habits you’ll find delightful little edge cases that you can handle with other expressions before the block (you can, for example, match specific fields).
Yikes. The syntax I had was wrong anyway. Should have been
awk 'BEGIN {FS=":"};{print $1}'
One benefit of the FS variable over -F, at least in original awk, is that by using FS the delimiter can be more than one character. I guess that's why I remember FS before I remember -F. More flexible.
That is not how FS is set; It's set with -F. And there is actually no need to use -v, passing variables at the end works consistently across all AWK's and always has:
To pile on :-) you often want -w (match word) flag to grep.
In awk, I couldn't find how to do this. I tried /\bfoo\b/ and /\<foo\>/ but neither worked. I don't know why and don't care enough which brings me to my major awk irritation ...
It doesn't use extended or perl REs, which makes it quite different to ruby, perl, python, java. Now, according to the man page it does; at least on OSX (man re_format) but as mentioned it didn't work for me.
GNU awk supports \< and \> for start and end of word anchors, which works for GNU grep/sed as well
GNU awk also supports \y which is same as \b as well as \B for opposite (same as GNU grep/sed)
Intererstingly, there's a difference between the three types of word anchors:
$ # \b matches both start and end of word boundaries
$ # 1st and 3rd line have space as second character
$ echo 'I have 12, he has 2!' | grep -o '\b..\b'
I
12
,
he
2
$ # \< and \> strictly match only start and end word boundaries respectively
$ echo 'I have 12, he has 2!' | grep -o '\<..\>'
12
he
$ # -w ensures there are no word characters around the matching text
$ # same as: grep -oP '(?<!\w)..(?!\w)'
$ echo 'I have 12, he has 2!' | grep -ow '..'
12
he
2!
On the other hand, grep can be far faster for searching alone than awk. I almost always use an initial grep for the string that will most reduce the input to the rest of the pipeline. Later, it feels idiomatic to mix in awk with matches like you suggested
Bravo! This is one of the most insightful comments I've read in a long time! I have been using some of these tools for years but I never thought of describing them this way. Now I can think of writing a complex query in relational algebra and translating it into these commands in a very natural way.
Indeed, and with a bit of tuning (e.g., using mawk for most things), one can get quite good performance. [1]
The project also provides a translator from Datalog to bash scripts [2].
Thank you, and thank you (really, not sarcasm) for the new stuff I have to learn about relational algebra. I'm a huge fan of wide/shallow knowledge that allows me to dive into a subject quickly.
It is from relational algebra used in database theory. There is an excerpt from one of the first MOOCS offered here on Lagunitas now.[1] It is pretty intuitive once you get the hang of it.
Its ubiquity and performance open up all kinds of sophisticated data processing on a huge variety of *nix implementations. Whether it's one liners or giant data scrubs, awk is a tool that you can almost always count on having access to, even in the most restrictive or arcane environments.
It's far more elegant and concise than any other scripting language I can think of using to accomplish the same thing.
As the article points out, other languages will have a lot more ceremony around opening and closing the file, breaking the input into fields, initializing variables, etc.
As part of a practical component to any software engineering degree should be a simple course on common Unix tools, covering grep, awk, sed, PCRE, and git.
I wholeheartedly agree. I've seen people agonize for days over results from Splunk that they want to turn into something more user-friendly. 15 minutes of messing around with the basic command line Unix tools has that information in a perfect format for their needs.
This is something I need to bring up to my coworkers, I should write some sort of basic guide to unix tools for them.
I use awk because I like to visually refine my output incrementally. By combining awk with multiple other basic unix commands and pipes, I can get the data that I want out of the data I have. I'm not writing unit tests or perfect code, I'm using rough tools to do a quick one-off job.
For instance, "mail server x is getting '81126 delayed delivery' from google messages in the logs, find out who is sending those messages".
# get all the lines with the 81126 message. Get the queue IDs, exclude duplicates, save them in a file.
cat maillog.txt | grep 81126 | awk '{print $6}' | sort | uniq | cut -d':' -f1 > queue-ids.txt
# Grep for entries in that file, get the from addresses, exclude duplicates.
cat maillog.txt | grep -F -f queue-ids.txt | grep 'from=<' | awk '{print $7}' | cut -d'<' -f2 | cut -d'>' -f1 | sort | uniq
Each of those 2 one-liners was built up pipe-by-pipe, looking at the output, finding what I needed. It's not pretty, it's not elegant, but it works. I'm sure there's a million ways that a thousand different languages could do this more elegantly, but it's what I know, and it works for me.