Hacker News new | past | comments | ask | show | jobs | submit login

I use awk because there's an almost 100% chance that it's going to be installed on any unix system I can ssh into.

I use awk because I like to visually refine my output incrementally. By combining awk with multiple other basic unix commands and pipes, I can get the data that I want out of the data I have. I'm not writing unit tests or perfect code, I'm using rough tools to do a quick one-off job.

For instance, "mail server x is getting '81126 delayed delivery' from google messages in the logs, find out who is sending those messages".

# get all the lines with the 81126 message. Get the queue IDs, exclude duplicates, save them in a file.

cat maillog.txt | grep 81126 | awk '{print $6}' | sort | uniq | cut -d':' -f1 > queue-ids.txt

# Grep for entries in that file, get the from addresses, exclude duplicates.

cat maillog.txt | grep -F -f queue-ids.txt | grep 'from=<' | awk '{print $7}' | cut -d'<' -f2 | cut -d'>' -f1 | sort | uniq

Each of those 2 one-liners was built up pipe-by-pipe, looking at the output, finding what I needed. It's not pretty, it's not elegant, but it works. I'm sure there's a million ways that a thousand different languages could do this more elegantly, but it's what I know, and it works for me.




I know you’re not asking for awk protips but you can prefix the block with a match condition for processing.

... | grep foo | awk ‘{print $6}’ | ...

becomes

... | awk ‘/foo/{print $6}’ | ...

If you start working this into your awk habits you’ll find delightful little edge cases that you can handle with other expressions before the block (you can, for example, match specific fields).


No one has mentioned changing the default field separator, e.g.,

  awk FS=:   '{print $1}' instead of cut -d: -f1

  awk FS="<" '{print $2}' instead of cut -d'<' -f2

  awk FS=">" '{print $1}' instead of cut -d'>' -f1


No need to explicitly set FS! Just use:

echo test,123 | awk -F, '{print $1}'


Yikes. The syntax I had was wrong anyway. Should have been

   awk 'BEGIN {FS=":"};{print $1}'
One benefit of the FS variable over -F, at least in original awk, is that by using FS the delimiter can be more than one character. I guess that's why I remember FS before I remember -F. More flexible.


-F does allow multicharacter separators (at least true for me on bash shell and gawk)

    $ echo 'Sample123string42with777numbers' | awk -F'[0-9]+' '{print $2}'
    string


you were close! the following works as well

  awk -v FS="\t"


If I am not mistaken, -v is GAWK only.


Every contemporary AWK supports -v. Real AWK from UNIX®️ supported -v since at least the '80's.


True. But there are differences when -v is used, as opposed to FS. Try this, where "nawk" is Lucent awk used by BSD

     cat > 1.awk << eof

     { print $ARGC }

     eof

     echo|nawk -f 1.awk FS=":"
     
     echo|gawk -f 1.awk FS=":"     
     
     echo|nawk -f 1.awk -v FS=":"

     echo|gawk -f 1.awk -v FS=":"


That is not how FS is set; It's set with -F. And there is actually no need to use -v, passing variables at the end works consistently across all AWK's and always has:

  echo "" | awk '{print Bla;}' Bla="Hello."


What if you set FS with -F but then later in the script want to change FS to something else.


The results will be unpredictable at best; either set it with -F, or use 'BEGIN {FS = "...";}', but not both.


So is -F, IIRC.


-F has always been supported by real UNIX®️ AWK; that's where -v and -F come from.


BUGS The -F option is not necessary given the command line variable assignment feature; it remains only for backwards compatibility.

EXAMPLES Print and sort the login names of all users:

            BEGIN     { FS = ":" }
                 { print $1 | "sort" }
The above is from the GAWK manpage. FWIW, the first example under EXAMPLES uses FS not -F.

There is nothing wrong with using FS instead of -F.


GAWK is not a real AWK!!! When will you people learn that GNU is not UNIX®️?

FS is not used on the command line and doing so is asking for trouble. FS is a built-in variable and as such is treated specially.


To pile on :-) you often want -w (match word) flag to grep.

In awk, I couldn't find how to do this. I tried /\bfoo\b/ and /\<foo\>/ but neither worked. I don't know why and don't care enough which brings me to my major awk irritation ...

It doesn't use extended or perl REs, which makes it quite different to ruby, perl, python, java. Now, according to the man page it does; at least on OSX (man re_format) but as mentioned it didn't work for me.

Details

   $ echo fish | awk  '/\bfish\b/' 
gets nothing, vs

   $ echo fish | perl -ne  '/\bfish\b/ && print' 
fish


UGH! Found the problem; it simply doesn't work. Assuming the OSX awk is the same as the freebsd awk there is a very old open bug on this:

awk(1) does not support word-boundary metacharacters https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=171725


GNU awk supports \< and \> for start and end of word anchors, which works for GNU grep/sed as well

GNU awk also supports \y which is same as \b as well as \B for opposite (same as GNU grep/sed)

Intererstingly, there's a difference between the three types of word anchors:

    $ # \b matches both start and end of word boundaries
    $ # 1st and 3rd line have space as second character
    $ echo 'I have 12, he has 2!' | grep -o '\b..\b'
    I 
    12
    , 
    he
     2

    $ # \< and \> strictly match only start and end word boundaries respectively
    $ echo 'I have 12, he has 2!' | grep -o '\<..\>'
    12
    he

    $ # -w ensures there are no word characters around the matching text
    $ # same as: grep -oP '(?<!\w)..(?!\w)'
    $ echo 'I have 12, he has 2!' | grep -ow '..'
    12
    he
    2!


Sure, but a fair bit of the value of the tool is it's consistency across platforms.

There's no point in awk if perl etc are ubiquitous and more consistent.


\< and \> work with GNU's awk:

  $ printf "fishstick\nfish\ngoldfish\n" | awk '/\<fish\>/' 
  fish


\b is Perl RE, not ERE. AWK not only supports ERE's, but POSIX RE's as well.


On the other hand, grep can be far faster for searching alone than awk. I almost always use an initial grep for the string that will most reduce the input to the rest of the pipeline. Later, it feels idiomatic to mix in awk with matches like you suggested


Depends on the awk. mawk is surprisingly fast.


Right. I don't consider that particular exhaustive at all and this has helped me when I wanted to do quick searches.


I always forget about that, and I should try more to remember it. Thank you for the tip!


I disagree, it's quite elegant if you think in terms of relational algebra operators:

* Projection (Π): awk and cut for simple cases

* Selection (σ): grep for simple cases, otherwise sed & awk

* Rename (ρ): sed

* Set operators: join, comm...


Bravo! This is one of the most insightful comments I've read in a long time! I have been using some of these tools for years but I never thought of describing them this way. Now I can think of writing a complex query in relational algebra and translating it into these commands in a very natural way.


Here's an interesting article that links shell scripting and relational algebra - http://matt.might.net/articles/sql-in-the-shell/


Indeed, and with a bit of tuning (e.g., using mawk for most things), one can get quite good performance. [1] The project also provides a translator from Datalog to bash scripts [2].

Disclaimer: I was one of the authors

[1] https://www.thomasrebele.org/publications/2018_report_bashlo... [2] https://www.thomasrebele.org/projects/bashlog/datalog


Thank you, and thank you (really, not sarcasm) for the new stuff I have to learn about relational algebra. I'm a huge fan of wide/shallow knowledge that allows me to dive into a subject quickly.


I’m pretty mathsy but I don’t get this


It is from relational algebra used in database theory. There is an excerpt from one of the first MOOCS offered here on Lagunitas now.[1] It is pretty intuitive once you get the hang of it.

[1] https://lagunita.stanford.edu/courses/DB/RA/SelfPaced/course...


Thank you for this context


ntfsql dreams


Its ubiquity and performance open up all kinds of sophisticated data processing on a huge variety of *nix implementations. Whether it's one liners or giant data scrubs, awk is a tool that you can almost always count on having access to, even in the most restrictive or arcane environments.


It's far more elegant and concise than any other scripting language I can think of using to accomplish the same thing.

As the article points out, other languages will have a lot more ceremony around opening and closing the file, breaking the input into fields, initializing variables, etc.


As part of a practical component to any software engineering degree should be a simple course on common Unix tools, covering grep, awk, sed, PCRE, and git.

A little bit of knowledge here goes a LONG way.


I wholeheartedly agree. I've seen people agonize for days over results from Splunk that they want to turn into something more user-friendly. 15 minutes of messing around with the basic command line Unix tools has that information in a perfect format for their needs.

This is something I need to bring up to my coworkers, I should write some sort of basic guide to unix tools for them.


> it's not elegant

I completely disagree.


Thank you, I too often talk down what I do.


Eloquently put!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: