Hacker News new | past | comments | ask | show | jobs | submit login
The Awk Programming Language (1988) [pdf] (archive.org)
415 points by dang on Jan 21, 2017 | hide | past | favorite | 103 comments



One of my favorite books - I initially bought a copy based on a review by Brandon Rhodes [0]:

> But the real reason to learn awk is to have an excuse to read the superb book The AWK Programming Language by its authors Aho, Kernighan, and Weinberger. You would think, from the name, that it simply teaches you awk. Actually, that is just the beginning. Launching into the vast array of problems that can be tackled once one is using a concise scripting language that makes string manipulation easy — and awk was one of the first — it proceeds to teach the reader how to implement a database, a parser, an interpreter, and (if memory serves me) a compiler for a small project-specific computer language! If only they had also programmed an example operating system using awk, the book would have been a fairly complete survey introduction to computer science!

[0]: http://stackoverflow.com/a/703174/2912179



The parser section is great. Recursive-descent parsing is something that everyone should be exposed to and practiced at least once, because it's so elegant and simple.


I wrote a compiler in awk!

To bytecode; I wanted to use the awk-based compiler as the initial bootstrap stage for a self-hosted compiler. Disturbingly, it worked fine. Disappointingly, it was actually faster than the self-hosted version. But it's so not the right language to write compilers in. Not having actual datastructures was a problem. But it was a surprisingly clean 1.5kloc or so. awk's still my go-to language for tiny, one-shot programming and text processing tasks.

http://cowlark.com/mercat (near the bottom)

(...oh god, I wrote that in 1997?)


I've always thought that AWK's most important feature is its self limiting nature: no one would ever contemplate writing an AWK program longer than a page, but once Perl exists the world is doomed to have nuclear reactors driven by millions of lines of regexps.

But no, there's always one. :-)


> I've always thought that AWK's most important feature is its self limiting nature

I agree. This idea doesn't receive enough attention. If you pick your constraints you can make a particular envelope of uses easy and ones you don't care about hard.

AWK's choice to be a per line processor, with optional sections for processing before all lines and after all lines is self-limiting but it defines a useful envelope of use.


I've written one or two awk programs that probably went beyond what the tool was intended for, but mostly I use short one-liners or small scripts. I use awk, grep, sed, and xargs pretty much daily for all kinds of ad-hoc automation.


> beyond what the tool was intended for

Not sure what that would mean. I think the tool was designed to be a user's programming language. I liken to think that `awk` was the Excel + VBScript of its days.


VBScript was largely replaced on Windows by Powershell. Awk is still popular for what it's good at.


Fair point. I guess I meant more what I thought it was intended for, i.e. mainly smallish text transforms where the entire program is given on the command line, often as part of a pipeline of several different utilities.


Im bookmarking that. Reason is David Wheeler and I's discussion of countering compiler subversion. Need to bootstrap the C compiler with something trustworthy & local. I looked into Perl since it's industrial strength and widely deployed. He mentioned bash since most (all?) UNIX's had it. My next thought was converting a small, non-optimizing compiler's source to bash or awk. So crazy it might work.

Now, you've posted a compiler/interpreter in awk for a C-like language that could allow easier porting of a C compiler's source. Hmmm. The license would have to be BSD so the BSD's could use it, too. Or pieces of it in my own solution.

I have a feeling whatever comes out of this won't make it into next edition of Beautiful Code. ;)


Now I look at it I see that it's not open source --- I'll add a cover note saying it's actually BSD.

Let me also say that if you actually want to use this for anything you're crazy. I wrote it when I was... younger... and when I had no idea what I was doing. The only thing it's useful for these days is looking at and laughing at.


I figure it might give me ideas for how to express some compiler concepts in awk. What I planto do is most brain-dead, simple implementation that's possible so anyone can understand & vet it.


Almost relevant: I wrote a parser generator in and for awk (called 'yawk' even though it did LL(1) instead of LR grammars), even older than this. But at some point I lost it, and it was never released online.


Which do you think would be better in terms of coming with all major distros and easiest to write compiler in: awk or bash? Ive forgotten both due to injury so cant tell without lots of experimenting.


I've never done any serious programming with bash, just simple Bourne shell scripts, because I don't want to think about all the escaping rules and such. I did write some programs in Awk in the 90s (notably https://github.com/darius/awklisp), so I'd go with that. Maybe someone who's bent bash to their will could speak up here?

AFAIK they're both ubiquitous, though you might need a particular awk like gawk for library functions, depending on what you need to do. Nowadays I'm way more likely to use Python, though of course it's a much bigger dependency.

Sorry about the injury, and good luck -- I'd like to hear how it goes.


The escaping in Bash can be a pain. I was recently writing an execution wrapper in Bash, and needed to send the results via JSON. Fighting with the quotes was almost enough to make me throw in the towel and move to a language with a builtin JSON parser, but I ran across this technique, of embedding a heredoc to preserve quotes in a variable. https://gist.github.com/kdabir/9c086970e0b1a53c3df491b20fcb0... It 'simplified' things and kept them readable.

Thanks for sharing awklisp. Nice reading for a Sunday morning.


I'm glad you enjoyed that, thanks. :)


Thanks for publishing it. I had long thought about writing a compiler in Awk. Finding yours through a comment here on HN some time ago served as a major validation of the idea. I ended up writing one.

Here is the result: https://github.com/dbohdan/all-caps-basic. It targets C and uses libgc along with antirez's sds for strings. The multi-pass design with each pass consuming and producing text is intended to make the intermediate results easy to inspect, making the compiler a kind of working model. The passes are also meant to be replaceable, so you could theoretically replace the C dependencies with something else or generate native code directly in Awk. You can see some code examples in test/. Unfortunately, the compiler is very incomplete. I mean to come back to it at least to publish an implementation of arrays.


The combination of "it worked fine" and "so not the right language" is intriguing. You wrote about the lack of data structures, can you share more (in both directions)?


Bear in mind that this was twenty years ago, so it's not exactly fresh in my mind; but basically: it was intended to do one job, once, which was to compile the real compiler (written in a simplified version of the language) into bytecode. Once that worked, I would never need to touch it again.

Which meant that it was perfectly allowable for it to be hacky and non-future proof, which it was.

Here's part of the code which read local variables definitions (in C-like syntax):

    function do_local( \
    nt, st, n, t){
        nt = readtype()
        st = tokens
        ensuretoken(readtoken(), token_word)
        n = tokens
        outofscope(n, 1)
        ldb[n] = "var"
        ldb[n, "type"] = st
        ldb[n, "sp"] = sp - spmark
        emit("# local " n " of type " st " at " (sp-spmark) "\n")
tokens is the value of the current token, ldb is the symbol table; you can see how I'm faking a structure using an associative array indexed by keyword.

There's nothing actually very wrong with this code, but there's no type safety, barely any checking for undefined variables, no checking for mistyped structure field names, no proper data types at all, in fact... awk wouldn't scale for making the compiler much more complicated than it currently is. But it did hit the ideal sweet spot for getting something working relatively quickly for a one-shot job. It's still really good at that.


Some points:

* Cannot have an array as an input to a function

* Cannot return an array from a function

* Meta-programming or pointers are only (barely) available in gawk

* There is an `@include` statement for `gawk` that is not part of POSIX, and there is no name spacing involved.

* Functions names can only exist in the global name space

There are some reasons somebody felt an urge to create perl... Still loving awk, and using it every day for text processing jobs.


>There are some reasons somebody felt an urge to create perl

Larry Wall (creator of Perl) says something pretty close to that here[1]:

"I was too lazy to do it in awk because it would have been hard to get awk to jump through the hoops I was wanting it to jump through. I was too impatient to wait for awk to finish because it was so slow. And finally, I had the hubris to think I could do better."

[1]http://www.linuxjournal.com/article/3394


I still think Awk is better for one-liners, but Perl gets the advantage for full size programs.


I actually found it really interesting that he was working on a high-assurance VPN when he created it to reduce his grunt work:

http://cahighways.org/wordpress/?p=8019

http://ieeexplore.ieee.org/document/213253/

BLACKER's heavy lifting in security was done by high-assurance kernel called GEMSOS:

http://www.cse.psu.edu/~trj1/cse443-s12/docs/ch6.pdf

It was a classified work for TRW whose details took a long time to get released. Might be why he rarely mentioned the BLACKER project in its origins. Possibly trying to obfuscate it a bit to avoid breaking laws.


http://cowlark.com/mercat/com.awk.txt

This is neat! Question for you regarding the formatting/syntax used-

Why the slash-newlines in function declarations?

E.g.

    function scope(name, \
    s) {


Awk does not support local variables. However, to simulate local variables you can add extra function parameters. I would guess that the backslash is inserted to separate the "real" parameters from the "local variable" parameters to make the code more readable. In your example, when the function `scope' is called only one actual parameter would be provided.


Last year I dug up Kernighan's 2012 release of awk, fixed up the test suite packaging and automated it, and wrote a makefile which adds clang ASAN support.

It found a couple bugs because the test suite is quite comprehensive. I think it's somewhat interesting that 5000 or so lines of C code polished over 20 years still has memory bugs.

I didn't fix the bugs, but anyone should feel free to clone it and maybe get some karma points from Kernighan. Maybe he will make a 2017 release. He is fairly responsive to email from what I can tell :)

https://github.com/andychu/bwk


" I think it's somewhat interesting that 5000 or so lines of C code polished over 20 years still has memory bugs."

Typical might be the word you're looking for there. The CompSci people doing static analysis for C programs often apply them to popular FOSS. They find new errors about every time.


As much as I defend C, if you're using C in a non-embedded environment, and you're handling any sort of textual input... don't. In fact, even if you're not handling textual input, think about not doing it.


Written by someone who you'd have to call "competent C programmer" to boot.


Yeah, I think the real problem is that the functional tests actually pass (on my machine, and most, I would assume).

But the C code is invalid. In many circumstances, a one byte buffer overrun is exceedingly likely to "work", but ASAN flags those as bugs with 100% reliability.

If I recall, it also had eiter a use-after-free or double free. The former can obviously cause problems but may not, not sure about the latter.


I've got the source code to both the book (in English and French) as well as awk. How? I sent email to bwk that we were trying extend awk to be sort of threaded (think awk scripts as first class so you have awk foo { } awk bar { } and you could do foo | bar). We called it bawk, BitMover's awk.

Anyhow, I asked Brian if we could base it off the one true awk and he tarred up ~bwk/awk and sent it to me.

I love that guy, the culture of the Bell Labs people and the people that worked with them is great.

I've stolen a bunch of awk ideas over the years. BitKeeper (first DSCM) has a programming "language" for digging info out of the repository. For example, this:

http://www.mcvoy.com/lm/bkdocs/dspec-changes-json-v.txt

prints out the repo history as a json stream. One of my guys said that it couldn't be done, heh, it could be :)

Everyone should learn some awk, it's so handy.


One of my commonly used Unix one-liners, using awk, is to get the sum of the file sizes for the files listed by the ls command (with the -R for recursive option if wanted):

ls -lR /path/to/dir | awk ' { s += $5 } END { print s / 1024 " K" } '

$5 is the 5th field of the output, which is the file size field in the case of ls output. The code inside the first set of braces runs once for every line of input (which comes from standard input, so from the ls command, in this case), and the code inside the second set of braces runs at the end of the input, calculating and printing the desired result of the total of all file sizes for files found by ls, in kilobytes. It can easily be changed to output the total in bytes or megabytes by dropping the '/ 1024' or adding another one after the first. Variable s is initialized to 0 by default at the start.

You can get similar info with "du -hs /path/to/dir" but the ls plus awk pipeline lends itself to more customization, such as adding conditions for the type or owner of the file, etc.


I'd use the find command with the -printf option (GNU find has this option but POSIX find doesn't define it) instead of ls. For instance:

find /path/to/dir -type f -printf "%s\n" | awk ' { s += $0 } END { print s " bytes" } '

The find command has much powerful file filtering capabilities than that of the ls command and works better with weird characters in filenames.


Thanks. Yes, I'm aware that in general find is a better option (long time Unix guy) than even a recursive ls command (ls -R) for finding files under a directory and processing them in some way (often together with xargs, to get around the args length limit). But mine was just a quick example, so I didn't use find. Actually, find is also better for this example, because with it, you do not have to deal with per-dir header lines like "dirname:" and "total n" (n blocks) that ls outputs. (The headers may not matter for my example, because I only process field 5, but they can matter for other kinds of processing of the output.)

There is also the -print0 option to find to handle filenames with newlines in them.

-print0 may be non-POSIX and a GNU extension.

POSIX has -print, but interestingly, in some Unixes I have seen that not using -print still prints the filenames found, by default.


That's the expected behaviour. Quoting from spec:

If no expression is present, -print shall be used as the expression. Otherwise, if the given expression does not contain any of the primaries -exec, -ok, or -print, the given expression shall be effectively replaced by: ( given_expression ) -print

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/fi...


Yes, I wasn't implying the behavior is wrong. Was just mentioning it. Anyway, thanks for that link, which explains why. That Open Group info on POSIX utilities is a great resource for when you want to know the comprehensive, well-specified behavior of the commands.


and your way breaks if a filename contains a new line.


You can fix that by adding -B (or -b) to the ls command.


Neither -b nor -B option is defined in the POSIX ls utility. The -b option is a GNU extension to print C-style escapes for nongraphic characters, and it is useful for this case. The -B option is also GNU extension not to list implied entries ending with ~, and I don't understand how it is relevant to this case.


On OS X and BSD (at least, and possibly others) the -B option forces octal-style \xxx output for non-printable characters, while -b uses C-style escapes where possible.

You are correct regarding the GNU option -B, which would not be relevant here, and regarding the fact that neither of these options are in POSIX. Thanks for pointing out those limitations.


PolyAWK (by Polytron), which included a copy of this book with each unit sold (see sticker on the cover of the linked PDF), was a _favorite_ tool of mine "back in the DOS (and early Windows) days". It was IIRC developed by Thompson Automation Software[0], who later sold the software package directly. The Thompson Automation Awk package included an awesome _awk compiler_, allowing creation of standalone .EXE files (using a 32-bit DOS extender, and later a Win32 version) from 1+ awk source files. The compiler presumably generated bytecode which was bundled into the .EXE file along with a 32-bit runtime which provided data capacity sufficient for a wide range of real-world projects. Anyway, TAWK gave me a huge productivity boost for a number of years during a time when such languages were only beginning to become available on the PC platform. And the ability to create single-file standalone EXE files greatly eased distribution of the tools I created. Good times.

[0] http://www.tasoft.com/


I reviewed the compiler in an old issue of DDJ:

http://www.drdobbs.com/tools/examining-the-tawk-compiler/184...

I ended up writing a couple of command-line email utility programs with it that I sold, for a while.


Wtf? Why doesn't anything like this exist today? Windows doesn't have anyway to create an .exe (that I'm aware of) besides C#, C++, turbo basic and that's about it. All I really want is a way to write terse code and release it to other users without installation of a runtime ...etc. I can't even distribute PS, because you can't guarantee another user has the right version.


I certainly agree with your sentiment. Yet I'm unsure that even C# or C++ _by default_ builds .EXE files that can be copied onto another (same OS) machine and run successfully (for C++ this is almost always possible, but IIRC not default build behavior).

"What happened" was industry-wide standardization on dynamic-linking of prerequisite (library) code (IOW, this code stays in separate DLL files typically stored in system-global locations), leading to the need for "installer" software whose purpose (I presume; I've entirely avoided dealing with that stuff) is to ensure that all prerequisite dynamic libraries are upgraded to the minimum version needed by the SW being installed, replaces any old version(s) of the program with the new, and modifies the Windows Registry in various and sundry ways (can you say "system-global variables run amok"?). The solution which I prefer is to build static-linked .EXEs (binaries) instead of dynamic-linked. Convincing toolchains to do this is a small exercise for the reader. OBTW: I think go (golang) static-links by default.

I stopped using TAWK compiler when I discovered Lua (5.1; IMHO a substantially better language than TAWK (this is not a criticism of TAWK)). I even went so far as to commission a "Lua Compiler" for Win32 which behaved almost identically to the TAWK compiler; I used this with great success for a few years. Unfortunately it was an internal tool which I lost access to when I departed that employer.

P.S. IIRC Borland Delphi also builds static-linked EXEs by default. I wrote one Delphi 2 (Win NT 4.0 era) program whose source code I've kinda lost track of which still runs fine on Win10 x64. TAWK and Lua are more productive languages than Delphi/Pascal, and it's trivially easy to add your own C library functions to Lua (for improved performance or added functionality), so I gravitated toward an overall preference for Lua.


Yea, but neither Lua or LuaJIT can make a true binary without some hack where you package up the interpreter as well.


It's not complicated and the interpreter is super light though


Forgot to say thanks for the excellent reply!


Golang? Actually has pretty decent windows support (although some projects tend to assume Unix paths etc).

And freepascal, nim. Perhaps ocaml (but might require some magic to generate a standalone exe? I belive unison is available as just an exe file?).

Ed: and rust?


Golang isn't good at fast development although that is a good point. Nim is still pretty immature. OCaml is great on Unix, but appears to be a pain on Windows unless you like Cygwin.


Just checked out the site. Seems they have ceased selling the software. A pity. I bet some people and companies would still buy it. I read the page about TAWK (a much enhanced awk, they say TAWK has added features such that it is a general purpose programming language), and the page about their Thompson Toolkit, which roughly seems like a lighter version of Cygwin / UWin etc., with some DOS features too.


FWIW TAWK and Thompson Toolkit (and competitor MKS (Mortice Kern Systems[0])) offered this functionality (awk and other unix commandline tools for DOS and subsequent MS OS') beginning in "the dialup era" (your purchase bought a box containing a manual (book!) and floppy disks). I would not be surprised to learn that the advent (and easy accessibility via high-speed internet) of Cygwin/UWin etc. killed these commercial tools markets.

[0] https://en.wikipedia.org/wiki/MKS_Inc.


Yes, it could be. I remember reading about MKS Toolkit in computer magazines (both print and online, IIRC) earlier; they used to advertise in them regularly. But many tools - not just of that category but others too - are still being sold; I think they are just not that visible on some forums like this, where the talk tends to be more about the web and the latest technologies.


Plain text version here, but the formatting is off in places: https://archive.org/stream/pdfy-MgN0H1joIoDVoIC7/The_AWK_Pro....


Awk is the #1 language I learned this year for fun.

I wrote a simple command line statistics tool that uses awk to calculate sum, stddev, and more. https://github.com/numcommand


Since this is Hacker News my plea may be answered: does anyone have the artwork or an actual example of the infamous AWK T-shirt? From memory it features a bird jumping (pàrachuting) out of a plane and is titled with AWK's most famous error message: "awk: bailing out near line 1."


This book should be required reading for anyone looking to write their own tech books.

It's short, clear, and concise. It's useful and helps you solve real problems with AWK. Who could ask for anything more?


Agreed.

The fact that one of the authors is Brian Kernighan is partly why, IMO. Just a few days back, I commented here in reply to someone about the quality of his K&P and K&R books (both of which I've used for trainings), on Unix and C respectively.


I wish that certain simple tasks in awk were a little less verbose, especially for command line use.

The number one example for me is counting by string in a csv file:

>> awk -F',' '{a[$1] +=1} END {for(v in a) print v,a[v]}'

Not that this is particularly difficult stuff, it's just a bit exhausting to find myself typing that over and over again. I'd love a more concise alternative to this.

Also, 'sort | uniq -c' is not a viable alternative for very large files.


Sounds like an opportunity to save it in a file! My $HOME/bin is full of shell scripts containing this sort of stuff so I can avoid retyping them.


Yes, that approach is normal automation and also part of the Unix philosophy. And of course you can pass command-line arguments to the scripts too.

A useful fact that I've seen some people didn't know: Unix metacharacters such as star and question mark (for filename matching, $ (in various uses such as $ star, $#, $!, etc.) - are all expanded by the shell, not by individual commands, so use of metacharacters is actually available to all commands and scripts that are run at the shell prompt - not just to selected or built-in ones.

Contrast that with DOS (at least in earlier versions) which had the problem that some commands supported wild-card characters such as for filename matching, but others did not. You could write your own logic for that using OS API calls (Int 21H etc., IIRC), or calls named like FindFirst and FindNext)but it was not built-in and freely available.

Edited for formatting.


People are often surprised when I mention that awk is Turing complete. It's quite a powerful tool, I can't imagine loving the command line as much as I do without it.


even sed is turing complete. I once did a talk about this at Opensource Bridge: http://tech.bluesmoon.info/2008/09/programming-patterns-in-s...


Meaning no disrespect, but I am constantly surprised by how this detail is frequently brought up as if were an unexpected aspect of a language. Just about any minimal scripting language and many tools are Turing-complete (LaTex, Minecraft, etc. among the latter.)

It's really a low bar to clear. In fact, it would be far more surprising if a language such as awk with counters, conditional statements, the ability to jump to statements (i.e., loops), and the ability to change memory were not Turing complete.

Edit: clarity


I love awk for text processing purposes. When analyzing log files, I often drop down into awk-mode to check the exceptional constellation that is currently under investigation. Very powerful to be able to say after three minutes: This happens in 0.5% of the cases.

Bought this book 2nd hand online. This book on one day costs $150, and on the next $2. The first bit has been an awesome read, never got to read much more. Tend to read much more from $READER. Sure this PDF will get me going again!


Man, I need to study some more weird languages. Just got done with the basics of Python and C for my first CS class. Over the summer I want to tackle LISP.


I can definitely recommend the University of Washington's Coursera on Programming languages. It's available here, and starts every few weeks I think: https://www.coursera.org/learn/programming-languages You'll learn SML (a strongly typed functional language), Racket (in the Lisp-scheme family of languages), and Ruby (to compare the previous languages with object-oriented ones). You'll even write your own language on top of Racket. It is a challenging class but I can say for sure that it has changed the way I think about programming.


Don't listen to those other doofuses, if you want to learn a weird language, I've got the one for you: Prolog!

You'll never use it in industry, but a few weeks with Prolog will bend your mind in just the right ways and teach you more about how you can model computation differently than six months with a Lisp. It's also cool as shit and really fun to program in. Prolog is a language you can learn just for the sheer joy in expanding your notions of what programming is, or at least could be.


So you know a bare-metal language and a scripting language. That's essentially opposite sides of the spectrum for industry and a great start. I'd recommend you do C# or Java next though as they are the perfect in between languages. Very fast and industrial strength, but more boilerplate than Python. After that you can jump to CommonLisp, Racket, Haskell, OCaml, F#, Awk, Red, Rust, Crystal, Perl6...etc.


Are the /AA and /ObjStm items a concerning indicator? This is the limit of my familiarity with pdf-id:

        $> python2.7 pdfid.py The_AWK_Programming_Language.pdf 
	PDFiD 0.2.1 The_AWK_Programming_Language.pdf
	PDF Header: %PDF-1.6
	..
	/Page                  0
	/Encrypt               0
	/ObjStm                7
	/JS                    0
	/JavaScript            0
	/AA                    1
	/OpenAction            0
	/AcroForm              0
	/JBIG2Decode         222
	...
It has /AA which is an automatic load action, and it has a lot of objects which could contain javascript, would need closer scrutiny I think.


I posted the above to prompt explanation from someone with expertise in pdf malware to validate the safety of the linked pdf. I'm concerned that it has open actions and objects that could be used to obfuscate js code. The author of pdf-id flags these attributes as requiring further inspection.

Wouldn't a tech pdf of a popular book that is impossible to obtain legally in digital form be an excellent vector to deliver malware to tech users with probably lots of stored credentials to resources?


Robbins' open source book may be of interest as well:

GAWK: Effective AWK Programming

https://www.gnu.org/software/gawk/manual/gawk.pdf


Used AWK a whole lot in early 90s for massaging source code. Mostly to analyse and refactor 1m+ LOC of COBOL. And Awk was brilliant for that. Have used it ever since when needed to text process. Around 2000 was using it a lot to get convert systems by running reports on old system and then getting the data from output text files. Clunky way to do it but faster than typing when there is no way to get the data directly. If a system can print to a text file then the data is available. Use awk still on Windows, OSX and Linux. Its an essential tool when faced with string/text processing tasks.


Awk as Lisp macro in TXR:

http://www.nongnu.org/txr/txr-manpage.html#N-000264BC

It has direct counterparts to all POSIX features, plus a number of extensions similar to ones found in Gawk, as well as some of its own: for instance, range expressions which freely combine with other expressions (including other range expressions), and range expressions which exclude either or both endpoints.


Literally writing a small awk script, took a break to check Hacker News. Nice.


It triggers my OCD that the names of the authors are in alphabetical order on the cover and not in, you know, the logical order.


Go on... explain why it's not the logical order.


Is there anything that "explains" sed only half as well as this book? I know how to use basic sed, but haven't yet completely grokked the way pattern space and hold space really go together.


The Unix Programming Environment covers sed/ed enough to grok it https://en.m.wikipedia.org/wiki/The_Unix_Programming_Environ...


AWK is still my go-to scripting language for quick tasks, like simple computation and basic data analysis. It is still the best thing in its problem space.

Given, AWK's problem space is very small, but still...


I've been learning this at work as part of a get-good-at-Linux regime :) One of the most surprising things for me is that as horrible to a beginner that some of the one liners in the command line can look, it's actually quite a forgiving scripting language. I don't think I've seen another language where you can increment a variable without declaring/initialising it, nor where you can set indices on an array without it being declared (except in a constructor fashion I guess).


BASIC (the traditional dialects thereof, anyway) does all of those things.

In fact, you can even read indices of an array without declaring it. If you do, it's auto-declared as having 11 elements (with indices 0 to 10), filled with zeroes.

This was still supported as late as VB for DOS. I wouldn't be surprised if VB6 also had this behavior.


Perl does those things if you don't "use strict;"

  $ perl -e '++$i && print $i;$foo[2]=99;$foo[1]=88;for (@foo) {print}'
  18899


Thanks for pointing that out! I haven't used perl yet, it should probably be on my list of things to learn though :)


A fair amount of it is based on awk, so awk might be better to start with.


Perl5 is a really big language...especially when you look at extending it with the 10 gigs of CPAN code. It grew from a need to have a better Awk, but I'm not sure if they're related close enough for starting with Awk to matter. Perl6 is an all around sister language that isn't ready for production yet, but has a ton of power and features including the ability to call other languages (python, perl5, Lua, scheme).


Sure. I meant that though in the context of "I've been learning this at work as part of a get-good-at-Linux regime". Which I suppose involves using awk mostly for command pipelines / one liners. Awk and Perl would have a lot of overlap (autosplit, matching, BEGIN, END, etc) in that space.


Yea, if you're just doing one-liners...probably easier to start with Awk. For more complicated scripts, Perl or Python should be built in.


I used awk until I learned Python (long ago). For me, awk was yet another example of the "worse is better" approach to things so common in unix. For example, if you make a syntax error, you might get a message like "glob: exec error," rather than an informative message. "Worse is better" is probably a good strategy in business and for getting things done, but still, mediocrity and the sense of entitlement that so often goes with carelessness, sickens me.


Awk meshes very well with a lot of my natural inclinations about text processing. I've sadly stopped using it lately as it seems that the majority of my use cases these days run up against a (to me) glaring deficiency in the language. Specifically, capture groups in pattern regexes. It's probably one of those "you're doing it wrong" kind of things, but if awk had that one feature, I probably wouldn't ever need to use perl.




Not sure about GNU, but BSD build systems depend on AWK for building installation media.

crunchgen, a compiled C program, has to call AWK.

Anyone out there do AWK-less builds?

Why did I need to learn a little AWK?

Because I could work out how crunched binaries were built without knowing some AWK.

Best thing about AWK IMO is the C-like syntax.

For anyone learning C and AWK concurrently, this kills two birds with one stone.


I love Awk on Unix. I really wish Windows had something closer to this.


YOU can get Nawk that runs on windows.-run it in powershell http://gnuwin32.sourceforge.net/packages/nawk.htm


you can get it, along with a number of other Unix tools, with MinGW (http://www.mingw.org/)


https://mingw-w64.org/ is actively developed, whereas MinGW appears to be dormant.


Yea, but I've always felt MinGW and Cygwin were kind of hacks.




The printed book still expensive on amazon. I guess it's still important these days.


I was working with a VOIP startup, and they needed to find some unique numbers in their CDR's (call detail records, basically a CSV list of calls made, duration, etc.) .

Loading the file into Excel took literally minutes as Excel tried to parse every field. It bogged down a 16GB RAM machine.

Using awk and uniq, the total run time of getting a solution , including reading the many MB of files and generating a summary into another file, was about 6 seconds.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: