I use awk because there's an almost 100% chance that it's going to be installed ...

jcims · on Jan 21, 2020

I know you’re not asking for awk protips but you can prefix the block with a match condition for processing.

... | grep foo | awk ‘{print $6}’ | ...

becomes

... | awk ‘/foo/{print $6}’ | ...

If you start working this into your awk habits you’ll find delightful little edge cases that you can handle with other expressions before the block (you can, for example, match specific fields).

3xblah · on Jan 21, 2020

No one has mentioned changing the default field separator, e.g.,

  awk FS=:   '{print $1}' instead of cut -d: -f1

  awk FS="<" '{print $2}' instead of cut -d'<' -f2

  awk FS=">" '{print $1}' instead of cut -d'>' -f1

chaps · on Jan 22, 2020

No need to explicitly set FS! Just use:

echo test,123 | awk -F, '{print $1}'

3xblah · on Jan 22, 2020

Yikes. The syntax I had was wrong anyway. Should have been

   awk 'BEGIN {FS=":"};{print $1}'

One benefit of the FS variable over -F, at least in original awk, is that by using FS the delimiter can be more than one character. I guess that's why I remember FS before I remember -F. More flexible.

asicsp · on Jan 22, 2020

-F does allow multicharacter separators (at least true for me on bash shell and gawk)

    $ echo 'Sample123string42with777numbers' | awk -F'[0-9]+' '{print $2}'
    string

cauthon · on Jan 22, 2020

you were close! the following works as well

  awk -v FS="\t"

3xblah · on Jan 22, 2020

If I am not mistaken, -v is GAWK only.

Annatar · on Jan 22, 2020

Every contemporary AWK supports -v. Real AWK from UNIX®️ supported -v since at least the '80's.

3xblah · on Jan 22, 2020

True. But there are differences when -v is used, as opposed to FS. Try this, where "nawk" is Lucent awk used by BSD

     cat > 1.awk << eof

     { print $ARGC }

     eof

     echo|nawk -f 1.awk FS=":"
     
     echo|gawk -f 1.awk FS=":"     
     
     echo|nawk -f 1.awk -v FS=":"

     echo|gawk -f 1.awk -v FS=":"

Annatar · on Jan 23, 2020

That is not how FS is set; It's set with -F. And there is actually no need to use -v, passing variables at the end works consistently across all AWK's and always has:

  echo "" | awk '{print Bla;}' Bla="Hello."

3xblah · on Jan 23, 2020

What if you set FS with -F but then later in the script want to change FS to something else.

Annatar · on Jan 28, 2020

The results will be unpredictable at best; either set it with -F, or use 'BEGIN {FS = "...";}', but not both.

jabl · on Jan 22, 2020

So is -F, IIRC.

Annatar · on Jan 22, 2020

-F has always been supported by real UNIX®️ AWK; that's where -v and -F come from.

3xblah · on Jan 24, 2020

BUGS The -F option is not necessary given the command line variable assignment feature; it remains only for backwards compatibility.

EXAMPLES Print and sort the login names of all users:

            BEGIN     { FS = ":" }
                 { print $1 | "sort" }

The above is from the GAWK manpage. FWIW, the first example under EXAMPLES uses FS not -F.

There is nothing wrong with using FS instead of -F.

Annatar · on Jan 28, 2020

GAWK is not a real AWK!!! When will you people learn that GNU is not UNIX®️?

FS is not used on the command line and doing so is asking for trouble. FS is a built-in variable and as such is treated specially.

emmelaich · on Jan 21, 2020

To pile on :-) you often want -w (match word) flag to grep.

In awk, I couldn't find how to do this. I tried /\bfoo\b/ and /\<foo\>/ but neither worked. I don't know why and don't care enough which brings me to my major awk irritation ...

It doesn't use extended or perl REs, which makes it quite different to ruby, perl, python, java. Now, according to the man page it does; at least on OSX (man re_format) but as mentioned it didn't work for me.

Details

   $ echo fish | awk  '/\bfish\b/'

gets nothing, vs

   $ echo fish | perl -ne  '/\bfish\b/ && print'

fish

emmelaich · on Jan 21, 2020

UGH! Found the problem; it simply doesn't work. Assuming the OSX awk is the same as the freebsd awk there is a very old open bug on this:

awk(1) does not support word-boundary metacharacters https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=171725

asicsp · on Jan 22, 2020

GNU awk supports \< and \> for start and end of word anchors, which works for GNU grep/sed as well

GNU awk also supports \y which is same as \b as well as \B for opposite (same as GNU grep/sed)

Intererstingly, there's a difference between the three types of word anchors:

    $ # \b matches both start and end of word boundaries
    $ # 1st and 3rd line have space as second character
    $ echo 'I have 12, he has 2!' | grep -o '\b..\b'
    I 
    12
    , 
    he
     2

    $ # \< and \> strictly match only start and end word boundaries respectively
    $ echo 'I have 12, he has 2!' | grep -o '\<..\>'
    12
    he

    $ # -w ensures there are no word characters around the matching text
    $ # same as: grep -oP '(?<!\w)..(?!\w)'
    $ echo 'I have 12, he has 2!' | grep -ow '..'
    12
    he
    2!

emmelaich · on Jan 22, 2020

Sure, but a fair bit of the value of the tool is it's consistency across platforms.

There's no point in awk if perl etc are ubiquitous and more consistent.

jolmg · on Jan 22, 2020

\< and \> work with GNU's awk:

  $ printf "fishstick\nfish\ngoldfish\n" | awk '/\<fish\>/' 
  fish

Annatar · on Jan 22, 2020

\b is Perl RE, not ERE. AWK not only supports ERE's, but POSIX RE's as well.

mistahenry · on Jan 21, 2020

On the other hand, grep can be far faster for searching alone than awk. I almost always use an initial grep for the string that will most reduce the input to the rest of the pipeline. Later, it feels idiomatic to mix in awk with matches like you suggested

davidgould · on Jan 21, 2020

Depends on the awk. mawk is surprisingly fast.

just_myles · on Jan 21, 2020

Right. I don't consider that particular exhaustive at all and this has helped me when I wanted to do quick searches.

bloopernova · on Jan 21, 2020

I always forget about that, and I should try more to remember it. Thank you for the tip!

nextos · on Jan 21, 2020

I disagree, it's quite elegant if you think in terms of relational algebra operators:

* Projection (Π): awk and cut for simple cases

* Selection (σ): grep for simple cases, otherwise sed & awk

* Rename (ρ): sed

* Set operators: join, comm...

chongli · on Jan 22, 2020

Bravo! This is one of the most insightful comments I've read in a long time! I have been using some of these tools for years but I never thought of describing them this way. Now I can think of writing a complex query in relational algebra and translating it into these commands in a very natural way.

_TwoFinger · on Jan 22, 2020

Here's an interesting article that links shell scripting and relational algebra - http://matt.might.net/articles/sql-in-the-shell/

thomas2718 · on Jan 22, 2020

Indeed, and with a bit of tuning (e.g., using mawk for most things), one can get quite good performance. [1] The project also provides a translator from Datalog to bash scripts [2].

Disclaimer: I was one of the authors

[1] https://www.thomasrebele.org/publications/2018_report_bashlo... [2] https://www.thomasrebele.org/projects/bashlog/datalog

bloopernova · on Jan 22, 2020

Thank you, and thank you (really, not sarcasm) for the new stuff I have to learn about relational algebra. I'm a huge fan of wide/shallow knowledge that allows me to dive into a subject quickly.

graphpapa · on Jan 22, 2020

I’m pretty mathsy but I don’t get this

rz2k · on Jan 22, 2020

It is from relational algebra used in database theory. There is an excerpt from one of the first MOOCS offered here on Lagunitas now.[1] It is pretty intuitive once you get the hang of it.

[1] https://lagunita.stanford.edu/courses/DB/RA/SelfPaced/course...

graphpapa · on Jan 31, 2020

Thank you for this context

agumonkey · on Jan 21, 2020

ntfsql dreams

rhombocombus · on Jan 21, 2020

Its ubiquity and performance open up all kinds of sophisticated data processing on a huge variety of *nix implementations. Whether it's one liners or giant data scrubs, awk is a tool that you can almost always count on having access to, even in the most restrictive or arcane environments.

jimbokun · on Jan 21, 2020

It's far more elegant and concise than any other scripting language I can think of using to accomplish the same thing.

As the article points out, other languages will have a lot more ceremony around opening and closing the file, breaking the input into fields, initializing variables, etc.

3fe9a03ccd14ca5 · on Jan 21, 2020

As part of a practical component to any software engineering degree should be a simple course on common Unix tools, covering grep, awk, sed, PCRE, and git.

A little bit of knowledge here goes a LONG way.

bloopernova · on Jan 26, 2020

I wholeheartedly agree. I've seen people agonize for days over results from Splunk that they want to turn into something more user-friendly. 15 minutes of messing around with the basic command line Unix tools has that information in a perfect format for their needs.

This is something I need to bring up to my coworkers, I should write some sort of basic guide to unix tools for them.

darau1 · on Jan 21, 2020

> it's not elegant

I completely disagree.

bloopernova · on Jan 22, 2020

Thank you, I too often talk down what I do.

samstave · on Jan 22, 2020

Eloquently put!