Show HN: Awk-JVM – A toy JVM in Awk

rethab · on June 23, 2020

Author here.

As I wrote in the README, this uses GAWK instead of plain AWK. Conveniences of GAWK over AWK in a nutshell: - functions - several additional functions (eg. bit shifting)

But even GAWK lacks some things that are very common in other languages: - no variable scope: imagine a calling another function in a for loop and the other function again running a for loop. if both loops use 'i' as the counter, good luck. the workaround for this is to declare the local variables as parameters that are not passed (and separate them with four spaces) - cannot return array from a function. the workaround is to use pass-by-reference (not sure if the precise definition is applicable here) - arrays cannot be assigned to another variable. workaround is to loop over array and assign it value by value.

If anybody knows better workarounds, please let me know :)

wahern · on June 23, 2020

It might be useful to provide an example using POSIX od(1) instead of hexdump(1). hexdump is actually a BSD utility. The Debian Linux version is ported from FreeBSD, while the Red Hat version is a pale imitation provided as a wrapper, IIRC, around the od implementation.[1] `od -An -tu1 -v` should suffice.

[1] I wrote a single-file hexdump implementation so know far too much about it: https://25thandclement.com/~william/projects/hexdump.c.html

mzs · on June 23, 2020

Nice project! It's only very old awks that don't support functions (like BSD awk does and nawk does on SunOS but /usr/bin/awk doesn't) and the GNU docs have an implementation of strtonum: https://www.gnu.org/software/gawk/manual/html_node/Strtonum-...

wahern · on June 23, 2020

POSIX awk supports user-defined functions. I'm surprised Solaris' default awk doesn't, but /usr/xpg4/bin/awk does support user-defined functions. (I just confirmed both behaviors on Solaris 11.4.)

flukus · on June 23, 2020

> the workaround for this is to declare the local variables as parameters that are not passed (and separate them with four spaces) - cannot return array from a function.

Does anyone know the history of this? It sounds like something that's to crazy to have been designed and came about as a hack that got widely adopted. It would be nice if they added a local keyword or a prefix or something.

userbinator · on June 24, 2020

I thought from the if/else chain in the mainloop that it appears to lack a switch() statement too, but the GAWK documentation indicates that it does have one.

ufo · on June 23, 2020

Honestly, maybe the most robust workaround is to rewrite it in another language... Lua might be a good candidate: it has a similar syntax and also uses 1-based arrays.

wahern · on June 23, 2020

But awk is usually part of the default system install, unlike Lua. And while GNU awk isn't as common, it's still far more likely to be available than Lua, especially on Linux systems. (Though, per elsethread, it shouldn't be too difficult to port to POSIX awk, or the superset of "portable" awk.)

Twirrim · on June 24, 2020

The odds of finding a linux system without perl5 on it are pretty slim. How much do you hate yourself? Is it enough to want to implement a JVM in perl.... :D

tyingq · on June 24, 2020

Perl comes with an awk to Perl translator...

Twirrim · on June 24, 2020

Hold my beer....?

tyingq · on June 23, 2020

Not the main point, and this is a very cool dancing bear. But, on this point:

"since none of the awks can read binary, you first need to pipe the classfile through hexdump"

Gawk works fine with binary files for me. Using FIELDWIDTHS for fixed length records or the readfile() extension to slurp in a whole file works fine. The readline() function can also be paired with FIELDWIDTHS to read a fixed number of bytes. Newline separated records with nulls in them also read as expected. I'm curious what problems the author saw with binary and gawk.

rethab · on June 24, 2020

I actually simply didn't know it was possible based on some googling. But thanks for hint. Someone also opened an issue on github giving me a hint on how to replace hexdump[0]. I'll definitely give this a try :)

0. https://github.com/rethab/awk-jvm/issues/1

WFHRenaissance · on June 23, 2020

This is definitely an accomplishment. Someone show this to Kernighan.

siraben · on June 24, 2020

The AWK Programming Language book is great not just for learning AWK but also has chapters on data processing, generating tables and graphs, relational databases and even a VM! I've implemented the assembler and VM from the book and extended the instruction set.[0]

[0] https://github.com/siraben/awk-vm

ketanmaheshwari · on June 23, 2020

METHODS[m]["attributes"][a]["data"][4]

Is this a five dimensional array? How does it get populated?

EDIT: I see it now in the code. Excellent!

leoh · on June 23, 2020

Yes, a bit tricky. You don't need to declare arrays in AWK, amazingly. Interesting how long it takes to get ergonomic gestures just right in languages.

https://www.gnu.org/software/gawk/manual/html_node/Array-Exa...

tyingq · on June 23, 2020

The technical term is a funny word..."autovivification". Perl does this as well.

zserge · on June 23, 2020

AWK is a very underappreciated language from the past, simple and fun to use. You did a very nice job, thanks!

enriquto · on June 23, 2020

What do you mean by "from the past"? All existing languages are from the past.

EDIT: what would be a language that is "not from the past"? Certainly Python(1991) is from the past if Akw(1977) is. The origin of Python is twice closer to the origin of awk than to the present time.

yjftsjthsd-h · on June 23, 2020

I'd argue that changes and improvements count almost as much as start date. Today, in 2020, "Python" mostly means "python3", and practically means "python>3.5 or so", which is not super recent but is a lot younger than 1991. Has AWK changed significantly in the last 30 years, or does current awk basically look the same as it did in System V?

acqq · on June 23, 2020

> Has AWK changed significantly in the last 30 years

Actually it did, exactly the dialect used by the author of OP, Gawk got some very new and, in the context of awk language dialects, advanced features recently.

https://www.gnu.org/software/gawk/manual/html_node/Namespace...

The introduction of "namespaces" construct allows it to work with older programs but also construction of "libraries" in separate files.

Anyway, I really like the OP implementation, everybody should take a look at:

https://github.com/rethab/awk-jvm/blob/master/jvm.awk

It's based on the https://github.com/zserge/tojvm/blob/master/vm.go but I consider the awk version really more "elegant" in some my-own-taste sense.

loudmax · on June 23, 2020

My first thought reading the headline was that this was an implementation of Awk that ran on a JVM. But no, this is the reverse of that. This is way is far less useful but infinitely more interesting. Bravo!

userbinator · on June 24, 2020

Such a thing does exist: http://jawk.sourceforge.net/

...and that naturally leads to pondering whether an implementation of AWK that runs on the JVM can then run an implementation of the JVM that runs on AWK, and vice-versa...

tennineeight · on June 24, 2020

If its all Turing complete, I see no reason why it can't be possible, theoretically speaking.

tyingq · on June 24, 2020

Added a pull request that wraps typeof() and calls a "polyfill" if it the typeof() function doesn't exist. Gawk didn't have that until v4.2.

exabrial · on June 24, 2020

Google is failing me, but I believe someone wrote a JVM in Excel too. When all you have is a hammer... :)