Hacker News new | past | comments | ask | show | jobs | submit login
AWK-ward Ruby (tomayko.com)
124 points by remi on April 26, 2011 | hide | past | favorite | 31 comments



If this article has made you more curious about Awk itself, I highly recommend The Awk Programming Language [1] by Aho, Kernighan and Weinberger. It's a terrific book all around (look at the list of authors), and Awk really is worth knowing if you use the command line a lot. (Credit where it's due: silentbicycle always recommends this book, and I read it on his advice. But I seem to have beaten him to this thread.)

[1] http://cm.bell-labs.com/cm/cs/awkbook (This link seems dead at the moment. I'm very much going to hope that's temporary.)


It's just a bit too late to edit, but the Bell Labs link is already working again.


It's nice to see an article that focuses on the common strengths of programming languages, and what came out of them, instead of the usual X vs. Y content. There is often so much common ground (especially between dynamic programming languages) that many solutions and knowledge are easily transferable and could be shared as well, independent from actual code.


Didn't know Ruby (and Perl?) had BEGIN and END... pretty cool. However I would just write the final example as:

curl -s http://www.gutenberg.org/files/1080/1080.txt | tr ' \t' '\n\n' | sort | uniq -c | sort -n

and avoid any kind of procedural programming.


Didn't know Ruby (and Perl?) had BEGIN and END...

Along with BEGIN & END there is also UNITCHECK, CHECK & INIT in Perl: http://perldoc.perl.org/perlmod.html#BEGIN%2c-UNITCHECK%2c-C...

These named blocks are used under the hood in Perl all the time. For eg.

    foo();  # => bar
    sub foo { say "bar" }
The above works because the last line is interpreted as:

    BEGIN { *foo = sub { say "bar" } }
And so is compiled before the foo() line is reached/run.

PS. Perl6 goes even further and calls them Phasers: http://feather.perl6.nl/syn/S04.html#Phasers


Me too :) Actually, check out the talk mentioned at the beginning of the essay. Pretty sure I build that exact pipeline at some point near the middle.



Blocks are also available in Perl, in the form of anonymous subroutines. They follow Perlish conventions for argument handling (rather than having a neat inline syntax like Smalltalk or Ruby), but they're most definitely the same thing!

  my $block   = sub {             say "w00t, blocks!"   };
  my $closure = sub { $block->(); say "w00t, closures!" };
You'll have to look further for something Ruby does that Perl doesn't :)


And in Perl6 you can drop the sub bit:

    my $block = { say "w00t, blocks!" };
And there are also pointy blocks for neater stuff:

    my $pointy_block = -> $text { say "$text, blocks" };
    $pointy_block("ne@t");


Yeah, smalltalk blocks are almost identical to ruby blocks.

Article just acted like that was a ruby invention (which they corrected)

My comment had nothing to do with perl.


I probably should have replied to rtomayko's comment rather than your own. I'd edit the tree, but it's immutable!


Updated. Thanks!


Awk is one of those languages in which learning the syntax almost automatically teaches you to think differently. In Awk's case, in terms of tables, maintaining state between rows, and phases in execution while processing data. I find it as much as a mind f* as Lisp or Prolog, though in a very workaday unixy sort of way.


I feel there's often too much focus on the execution phases in introductory texts. I've done quite a bit of scripting in a proprietary derivative of AWK (primary difference: multidimensional arrays and structs) and I've never used the execution phases in any of those scripts. I really don't see them as an inherent, defining part of the language.


This is very cool. It sort of ruins the approach I use when teaching people to program in Ruby or Python where I have them write some code to calculate stats on a text file then show them how trivial it would have been in awk.


Similar Perl/Awk syntax for Python one-liners with BEGIN/END: http://code.activestate.com/recipes/577075-pyliner-script-to...


The -e is what tells perl that "this is code, not a filename." One I regularly use (with perl) that doesn't seem to be mentioned yet is -p -i -e to inline modify files line-by-line. This is very nice for global search/replace actions on files found via find, grep or ack.


Yup, -i is very very useful, though I prefer to always call it as -i.bak (=> do the edit in place and make a backup of the original file as "filename.original.bak"), in case I do something hideously wrong.

For what it's worth (you may know this but maybe not everyone does), Ruby also provides the -i flag.


Nice article. I use AWK and sed all the time. I still prefer writing some piped shell commands in one (sometimes lengthy) line than proper perl/python/ruby script, unless it's really needed. Why? It (paradoxically?) seems more natural to me in shell environments (and often can be done much more quickly that way).

Example of AWK usage from my old (currently unmaintained! and insecure!) pcspk project (http://download.przemoc.net/pcspk) is "Siemens ringtone converter": https://gist.github.com/943386 (actually it requires gawk, which has nice extensions)


This is both the most useful and engaging introduction to AWK that I've read and the most intense deprecation signal by showing how to (almost drop-in) replace it with Ruby or Perl, which creates a terrible conflict in me.

"print $1" was the most I ever used AWK. So when I read the first part (whose examples made great sense to me) I shouted "AWK is awesome! I should learn it now!". Then the last part came in and showed me how I could use a language I already know to precisely replace it. I both feel compelled that I can leverage my current knowledge right now to solve another class of problems, and sad not to learn a completely new thing.


I dare to suggest using $;=/[^a-zA-Z]+/ in the BEGIN block, then you can get away with using only split.each on the second line, and it's more perl-awk-ish :)


Even more perl-awk-ish would be to use the -F and -a switch and move the regexp onto the command line, eg: https://gist.github.com/942975


Primitive network calculator I once wrote in awk:

https://gist.github.com/7bb70e1065f085b46a00


Is it just me or is the video of the talk at http://shellhaters.heroku.com/ broken?

On closer examination I noticed the video is here http://confreaks.net/system/assets/datas/1177/original/363-g...


Simplified C++ inspired by AWK: http://github.com/lvv/scc


Isn't there an AI course taught somewhere that only uses AWK? Sure I remember reading about that somewhere.


It's Awk, not AWK.

So 'ruby -n' acts like Awk? Neat.


GNU's is "Awk" but the language and most early implementations are referred to as "AWK" from everything I've read. More here:

http://en.wikipedia.org/wiki/AWK


Huh, I think you're right -- thanks for the correction. Maybe they ought to follow LISP's example, though!


FWIW, I was copied on an email exchange with Brian Kernighan regarding the piece and he referred to AWK as "Awk". Thought you'd like to know :)


Heh. I'm trying to remember how it appeared in A, K, & W's book: was it 'AWK' with the W and K in small caps? That'd explain the confusion. There's a copy on its way to me now. (It's a surprisingly great book.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: