A brief interview with Awk creator Dr. Brian Kernighan (2022)

gregw2 · 2024-07-17T20:43:06.000000Z

Little known random Brian Kernighan facts:

* He joined Princeton’s CS department in 2000 but taught at least one class there as early as 1993 while still at Bell Labs Research (on sabbatical?)

* One of his students regularly brought a 386sx laptop (running pre-1.0 Linux) to class and when Brian was asked more obscure questions about what awk did which he couldn’t remember, the student would run commands in awk and feed Brian the definitive implementation answer. So Brian had some exposure to Linux moderately early on.

* Here’s a writeup from him on putting AT&T’s toll free phone directory on the internet back in fall 1994: https://www.cs.princeton.edu/~bwk/800.html

Affric · 2024-07-18T02:29:36.000000Z

Is gawk the definitive awk implementation or am I thinking of gsed being the one that’s really different?

LukeShu · 2024-07-18T03:06:17.000000Z

I'm not sure what you're thinking of, but there at least 3 prominent and still-relevant awk implementations:

nawk (One True Awk / AT&T awk / BWK awk): the original awk, still maintained; is used by macOS, FreeBSD, OpenBSD, and NetBSD https://github.com/onetrueawk/awk

mawk (Mike's awk): the default awk on Debian https://invisible-island.net/mawk/

gawk (GNU awk): the default awk in lots of other places https://www.gnu.org/software/gawk/

gregw2 · 2024-07-18T07:46:48.000000Z

There were some differences even in the mid 90s between AT&T-style awk which was then considered the original/definitive/most-backward-compatible and gawk which was new/shiny (although not /that/ new).

I believe both were available in Linux back then. I believe in the above anecdote both were consulted with the emphasis on AT&T-style awk for obvious reasons.

smitty1e · 2024-07-18T01:01:04.000000Z

> One of his students

Smells like the post author, no?

rr808 · 2024-07-18T02:42:21.000000Z

Lex Fridman did a good hour and a half interview with BK if you're interested. https://www.youtube.com/watch?v=O9upVbGSBFo

deepspace · 2024-07-18T04:46:42.000000Z

The Computerphile YouTube channel also did many interviews with BK: https://www.youtube.com/results?search_query=computerphile+k...

Sometimes, I feel extremely privileged to be living today. It's like a mathematician being able to interview Euler or a physicist interviewing Newton.

fuzztester · 2024-07-18T05:53:36.000000Z

checking it out.

I didn't like his talks earlier (had only seen parts of a few), don't know why.

but interestingly (to me), this one seems to be good, at least at the start, where I am now.

as the real talk starts after the ads, looking at him, for some reason, I thought he looked like a sinister mafia don wearing a tie. :)

kagakuninja · 2024-07-18T18:40:16.000000Z

Lex seems to love prominent tech bros like Zuckerberg and Musk, and has also done a softball interview of Joe Rogan, in which Rogan was allowed to spew unsubstantiated alt-right statements. If you don't like any of those people, Lex might be a turn-off.

His attitude seems to be that he will interview anyone without pre-judgement, and try to find common ground with them or something. "Woke" people like myself view that as enabling bad actors.

I have listened to some great interviews from him, so it is a mixed bag.

Aaronstotle · 2024-07-18T18:52:40.000000Z

I first found Lex because he would interview really interesting technical people, like Jim Keller.

Like you, I found Lex off-putting in that he will platform anyone, and then will just say Love will prevail, I think its incredibly naive.

rubytubido · 2024-07-19T20:17:52.000000Z

> off-putting in that he will platform anyone

Why? You are not interested to hear different opinions?

kagakuninja · 2024-07-22T03:03:37.000000Z

1) I don't need to hear anything from Ivanka Trump ever, thank you.

2) Platforming dishonest hacks like Tucker Carlson does the nation a disservice, unless he is going to call out all the lies as they happen. While I have not listened to the Carlson interview, based on what I heard from Joe Rogan, I do not believe Friedman will significantly push back on right wing misinformation, for whatever reason.

fuzztester · 2024-07-18T19:27:59.000000Z

his demeanor looks naive to me. it might be a put-on thing, though.

adamrezich · 2024-07-18T22:29:31.000000Z

Why are we still using "alt-right" in 2024 to describe people who do not self-describe as such, just because they espouse beliefs outside the mainstream liberal orthodoxy? 2016 was a long time ago, and Richard Spencer has faded into irrelevant obscurity, so why do people still insist upon using this ill-defined and incendiary term?

sgarland · 2024-07-19T03:55:36.000000Z

How else do you want to define the modern GOP? Their policies and societal views are markedly different from Bush-era. It’s bewildering to think of Romney and Trump being in the same party.

rsdfvn1099 · 2024-07-19T11:49:38.000000Z

Obviously there is no official count but the consensus is there are 6 factions within the GOP. https://en.m.wikipedia.org/wiki/Factions_in_the_Republican_P...

lanstin · 2024-07-18T22:07:41.000000Z

The only one I've listened to was him interviewing Donald Knuth which was worth listening to, tho from time to time I got the impression of an interviewer that didn't quite know enough to do the interview.

If you like podcasts of a very intelligent person interviewing very accomplished guest thinkers, Sean Carroll's Mindcast is slightly higher level: https://www.preposterousuniverse.com/podcast/ (tho I don't think Donald Knuth has been on that one, lots of other computer folks have).

fuzztester · 2024-07-18T22:41:57.000000Z

that sounds interesting, will check it, thanks.

alex_lav · 2024-07-18T21:20:38.000000Z

> Rogan was allowed to spew unsubstantiated alt-right statements.

So just normal JRE content

FelipeCortez · 2024-07-17T19:13:31.000000Z

There's a more comprehensive interview that also includes Aho and Weinberger in the book Masterminds of Programming. Highly recommended

chasil · 2024-07-17T20:12:33.000000Z

His wiki lists several interviews, for those inclined:

https://en.wikipedia.org/wiki/Brian_Kernighan

http://www.linuxjournal.com/article/7035

http://www-2.cs.cmu.edu/~mihaib/kernighan-interview/index.ht...

https://web.archive.org/web/20090428163341/https://www.princ...

https://web.archive.org/web/20131126220450/http://princetons...

kragen · 2024-07-17T19:58:16.000000Z

i've been reading the unix programming environment from 01983 this week (the old testament of kernighan and pike), about 35 years later than i really should have. awk is really one of the stars of the book, the closest thing in it to currently popular languages like js, lua, python, perl, or tcl. (what awk calls 'associative arrays' js just calls 'objects'.)

the 7th edition unix version of awk from 01979 https://www.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/cmd/aw... (from https://www.tuhs.org/Archive/Distributions/Research/Henry_Sp...) is only 2680 lines of source code, which is pretty astonishing. the executable is 46k and ran in the pdp-11's 64k address space. as far as i can tell, that version of awk, and also the one documnted four years later in tupe, didn't even have user-defined functions. neither did the v7 version of the bourne shell, i think

bc, however, did

mananaysiempre · 2024-07-17T20:17:08.000000Z

> that version of awk [...] didn't even have user-defined functions

Also known as “old awk” or (retronymically) “oawk”. Heirloom still carries[1] what should be a descendant of it (alongside the more familliar nawk).

Also,

> the executable is 46k

thus still a bit larger than Turbo Pascal 3.02[2] :) That didn’t have a regex engine, admittedly.

[1] https://heirloom.sourceforge.net/man/awk.1.html

[2] https://prog21.dadgum.com/116.html

kragen · 2024-07-17T23:46:29.000000Z

the 39.7-kilobyte turbo pascal executable doesn't include the (modest) pascal runtime library, because it is itself written in assembly, not in pascal. awk, being an interpreter, does have to include the runtime library in the same executable as the parser. awk is also written in c rather than assembly, and moreover using lex and yacc; lex compiles the lexer into a deterministic automaton written in c for speed, and all of this is compiled with pcc, whose optimizations were relatively weak by today's standards

i haven't compiled version 7 myself so i don't have a clear idea of how big it is

a slight correction to my above comment! on running `file *` in bin/, i see, among other things

    at:       PDP-11 pure executable
    awk:      PDP-11 separate I&D executable
    bas:      PDP-11 pure executable
    basename: PDP-11 pure executable
    bc:       PDP-11 pure executable

if i understand correctly, this means that this awk executable is built with what we used to call the 'small memory model' on the 8088, using separate instruction and data segments. that way, awk could have data stored at the same address as executable code, so you had 64 kibibytes available for just the data. you didn't have to squinch your awk program and all its data into the 65-46 = 19 kilobytes left over after the awk executable was loaded

this, in effect, allows you 128 kilobytes for the functionality of your program: when you call an awk builtin function (usr/src/cmd/awk/proc.c lists seven named functions: substr, sindex, sprintf, printf, print, split, and instat) you're invoking code in the instruction segment, while the logic of your interpreted code shares the data segment with the data you're manipulating (strings or arrays or whatever)

so it's at least misleading that i said it 'ran in the pdp-11's 64k address space'

not all pdp-11s supported separate instruction and data segments (i think the more expensive pdp-11/45 did, but the pdp-11/40 and pdp-11/20 didn't) and most of the 7th edition executables didn't require that. but awk, adb, dcheck, dd, fgrep, icheck, lex, ncheck, refer, restor, sort, tar, tbl, uucp, uulog, uux, and yacc did. i'm not sure how you were supposed to run a unix system without dd, icheck, tar, uucp, restor, or adb (sdb and dbx didn't exist), so i think maybe at this point they had given up on supporting the smaller models

coliveira · 2024-07-17T22:54:55.000000Z

Turbo Pascal was written in asm and highly optimized to reduce its size.

fuzztester · 2024-07-18T05:59:55.000000Z

in under 40 KB it had a lightning fast compiler that generated .COM files, and a text based IDE with a WordStar compatible code editor.

tannhaeuser · 2024-07-17T21:05:28.000000Z

> the closest thing [...] to currently popular languages like js

awk is more than that: having introduced the "function" keyword, for e in a, semicolon-less syntax, stringly-typing, delete x[e], etc, etc, it's quite obviously the base for JavaScript, and Brendan Eich said as much [1].

[1]: https://brendaneich.com/2010/07/a-brief-history-of-javascrip...

kragen · 2024-07-17T21:17:32.000000Z

i'd say perl is closer, but for (e in a) is from awk and not perl

semicolonless syntax is in js, tcl, lua, and sh, but not awk or perl; in awk omitted semicolons are interpreted as string concatenation

dakiol · 2024-07-17T21:43:39.000000Z

I read the same book and got the exact same conclusion. Then I read The UNIX—Haters Handbook and thought that Unix tools (like awk) are sadly not good enough anymore.

kragen · 2024-07-17T21:57:28.000000Z

yeah, there are a lot of questionable design decisions in there that are only defensible in the context of the pdp-11's 64k address space and slow cpu

fuzztester · 2024-07-18T04:14:48.000000Z

what are some of those?

kragen · 2024-07-18T04:32:07.000000Z

making everything a string, for example. representing string length with null termination. splitting shell variables into words after variable substitution (rather than having a list type). a zillion shell-script bugs when filenames, directory names, search strings, etc., contain spaces. not having user-defined subroutines in sh or awk. no command-line editing. command names and options that read the same as you type them (thus forcing you to choose between unreadable commands and commands that are too unwieldy to type comfortably). no tab completion. more generally, no interactive feedback on what is going to happen when you submit your command. no job control. having to retype command output as new input all the time. table layout that only works with fixed-width fonts. filenames limited to 14 characters. no text search through previous output. hardlinks rather than reflinks. an inflexible security policy defined in the kernel. the reification of . and .. in the filesystem. exposing the layout of filesystem directories to every program that enumerated directory entries. no way to undo rm or mv. no screen editors. arbitrary line-length limits all over the place. . in the path. textual macro expansion as the primary form of higher-order programming. backslashed backquotes. backslashed backslashes. total precedence orderings on infix operators. slightly different regular expression syntaxes for each program. self-perpetuating at jobs. etc.

a lot of these have been fixed, but some are hard to fix

38 · 2024-07-18T00:14:16.000000Z

> 01983

> 01979

thats not how you write years, or even numbers really.

buzzy_hacker · 2024-07-18T00:18:04.000000Z

Guessing it’s from following the style of https://en.m.wikipedia.org/wiki/Long_Now_Foundation

38 · 2024-07-18T03:03:13.000000Z

[flagged]

buzzy_hacker · 2024-07-18T10:39:38.000000Z

I think Wikipedia has it wrong actually after reading more closely. It’s not a way to solve the year 10,000 problem, as in a date formatting issue akin to Y2K, as Wikipedia suggests.

Instead it’s to get you thinking of the long term future.

The Wikipedia article does say that is the goal of the Long Now Foundation, but not how the leading 0 is related.

gruturo · 2024-07-18T07:31:20.000000Z

That's also idiotic because we don't put a leading zero in front of 1, 2 or 3 digit years, and neither will future historians put one in front of 4-digit ones - it's just gratuitously annoying.

Charlemagne was crowned in 800 AD, not in 0800 or 00800.

Pompeii was destroyed by the Vesuvius in 79 AD, not 079 or 0079 or 00079.

The battle of the Teutoburg Forest was in 9 AD, not 09, 009, 0009 or 00009.

Also why stop at 5 digits while you're being ridiculous?

defrost · 2024-07-18T07:33:32.000000Z

. . . and now you're engaging.

Church of the Long Now wins again.

kragen · 2024-07-18T07:36:06.000000Z

please take this kind of 'engagement' back to twitter where it belongs

defrost · 2024-07-18T07:37:37.000000Z

You've replied to the wrong person I suspect.

kragen · 2024-07-18T07:41:53.000000Z

apologies, yes

defrost · 2024-07-18T07:45:13.000000Z

No drama, footguns are easy mistakes to make.

I'm happy to expand for others; regardless of my feelings on the matter the Long Now use of a leading zero creates questions and engagement, thus spreading their message under the mantra of "there's no such thing as bad press" | "get the public talking".

It's an Advertising 101 strategy and not at all idiotic save perhaps at the surface of first exposure.

greentxt · 2024-07-19T15:02:46.000000Z

I've been reminded before that comparing hn commentary to reddit or other communities is itself a violation of hn guidelines. You can just upvote/downvote or "tap the sign" by posting the hn guidelines. No need for insinuating the redditization nor twitterization of hn commentary.

kragen · 2024-07-19T19:33:59.000000Z

appreciated

kragen · 2024-07-18T04:08:49.000000Z

some of us post hn comments containing analyses of source code from early versions of unix, results from tracing the execution of uxn 'left' and other text editors to estimate their power usage https://news.ycombinator.com/item?id=40842397, overviews of the historical development of cpu architecture https://news.ycombinator.com/item?id=40983675, simple explanations of the role that digital signal processing plays in modern mixed-signal systems https://news.ycombinator.com/item?id=40985558, quantitative analyses of the economic and technological impact of the renewable energy transition https://news.ycombinator.com/item?id=40725765 https://news.ycombinator.com/item?id=40724305, assembly-language programs to clarify the role of associative arrays in computing https://news.ycombinator.com/item?id=40990055, analyses of mostly forgotten programming languages that python evolved from https://news.ycombinator.com/item?id=40992191, examples of how to use the gnu units program for all kinds of things most people didn't know it could do https://news.ycombinator.com/item?id=36988917, comparative benchmarks of different compilers and interpreters https://news.ycombinator.com/item?id=40960819, system call traces to elucidate subtle concurrency bugs in unix https://news.ycombinator.com/item?id=40955923, close readings of federal court rulings placing them in historical context https://news.ycombinator.com/item?id=40950806, gui libraries we wrote ourselves https://news.ycombinator.com/item?id=40942566, simple workarounds for frustrating unix problems https://news.ycombinator.com/item?id=40911681, appreciation for others https://news.ycombinator.com/item?id=40950084, recipes for video streaming from the command line https://news.ycombinator.com/item?id=40911239, admissions of error with reproducible scripts for clearly showing we were wrong https://news.ycombinator.com/item?id=40912051, helpful tips for everyday command-line usage https://news.ycombinator.com/item?id=40910845, thanks to others for correcting our errors https://news.ycombinator.com/item?id=40911035, quantitative analyses of cutting-edge hardware https://news.ycombinator.com/item?id=40903708, admissions of error in our analyses of chinese export-control regulations https://news.ycombinator.com/item?id=40885691, simple explanations of cutting-edge computer security research https://news.ycombinator.com/item?id=40885329 https://news.ycombinator.com/item?id=40841095, simple explanations of cutting-edge software testing tools https://news.ycombinator.com/item?id=40879952, and implementations in three lines of forth of things people think are impossible in forth https://news.ycombinator.com/item?id=40868981

and some of us say things like 'thats idiotic because thats 7000 years away'

these are two different kinds of discourse, and this website is only for one of them

chockablock · 2024-07-18T09:08:48.000000Z

It’s a pointless affectation that distracts from your technical content.

The evidence for your behavior being distracting/provocative is right here in this thread.

kragen · 2024-07-19T03:56:07.000000Z

what distracts from my technical content is vacuous, aggressive replies to it by people who ask me to take responsibility for their behavior. no, thank you

to you too i ask: which of these two kinds of discourse are you engaging in right now?

if you're capable of the other one, i suggest you demonstrate that

your comment purports to be concerned with the objective qualities of mine: 'pointless', 'distracting', 'provocative'. but actually all of these terms are attempts to recast your subjective preferences as objective realities; all you're really writing about is how you feel when you read my comments. the implicit premise here is that other people ought to be writing their comments in order to satisfy your preferences

why? what makes you so admirable? what reason does anyone in the world have to value your opinion or consider your preferences important? if you want comments written to satisfy your preferences, write them yourself

as far as i can tell, like 38, gruturo, and spencerflem, you're just a person who harasses strangers on internet fora if you think they talk funny, then blames your own behavior on them

you want to know what i feel?

because it sure isn't admiration and eagerness to satisfy you

it's contempt

akira2501 · 2024-07-18T04:43:00.000000Z

Why not 001982?

When are leading zeros in front of numbers particularly useful? Even in fixed column formats they're almost always left blank.

WesolyKubeczek · 2024-07-18T14:13:06.000000Z

He doesn’t give sentient life more than until AD 99999 in this universe.

kragen · 2024-07-19T04:41:44.000000Z

i fully support your choice to henceforth use two leading zeroes on your years or otherwise format your comments however you please

spencerflem · 2024-07-18T04:14:15.000000Z

Dude, they've got a point, your style is annoying and not even right in a "technically correct" sorta way. You don't write things that happened 5 AD as 00005 AD.

kragen · 2024-07-18T04:18:02.000000Z

which of these two kinds of discourse are you engaging in right now

if you're capable of the other one, i suggest you demonstrate that

dude

inglor_cz · 2024-07-18T09:08:16.000000Z

Personally, I can take the 0abcd style of years without going crazy, but it definitely derails my reading flow a bit. I would compare it to writing personal names without capitals, like donald trump.

But I also know that it is kragen commenting without looking at the header :)

I suppose everyone has their own quirks.

kragen · 2024-07-19T06:38:44.000000Z

simpaticoder · 2024-07-18T05:38:06.000000Z

Why not 3 or even 10 leading zeroes? Why so pessimistic?

dredmorbius · 2024-07-24T22:56:43.000000Z

<https://news.ycombinator.com/item?id=41002048>

bch · 2024-07-18T01:42:57.000000Z

They're written in awktal.

fuzztester · 2024-07-18T04:18:01.000000Z

are you sure it's not hexal or binal?

kragen · 2024-07-18T04:33:34.000000Z

as long as it's not intercal

fuzztester · 2024-07-18T19:28:52.000000Z

he he. iirc, esr said somewhere that he wrote an interpreter for intercal.

cinntaile · 2024-07-18T12:23:24.000000Z

Those 0s in front of the year really tripped me up... I thought it was some obscure date format.

kstrauser · 2024-07-18T01:39:30.000000Z

Why do you put zeroes in front of your dates?

defrost · 2024-07-18T01:44:58.000000Z

It's a nod to the Long Now Foundation

https://longnow.org/

and can be taken in many ways, either as a sign of a true believer or just to invoke a sense of the scale of time beyond the immediate.

cafard · 2024-07-18T10:44:52.000000Z

Years ago, somebody had as his Usenet .sig line "perl is margarine, awk is butter" (unless the order was reversed). Whatever Perl's many faults, I used awk a good deal less after I discovered Perl.

rswail · 2024-07-18T04:44:55.000000Z

I own a copy of K&R that is signed by Dr Kernighan from an Australian Unix conference back in the 80s. One of my prized possessions.

That book, along with The Practice of Programming by Kernighan/Pike and Byte magazine etc were my "Stack Overflow" for the 1980s.

Zambyte · 2024-07-17T19:04:08.000000Z

It's interesting to me that he refers to association arrays as "newish", when they showed up in Lisp nearly 20 years earlier.

jerf · 2024-07-17T19:36:41.000000Z

Two things to keep in mind: Pre-internet, it was quite easy to be skilled and knowledgeable and yet never have been exposed to something another community somewhere would consider basic, or for it to take many many years to percolate across. I remember getting into programming even in the 1990s and I can tell you a lot of the C world would still consider associative data structures like a hash table something to be a bit exotic and only to be pulled out if you had no other choice. It was very easy to not be very up to date on these things.

Second, while the programming world fancies itself a very fast mover, it really isn't, and it especially wasn't back then. 20 years isn't necessarily as long as you would think in 2024. We're still trying to convince people in 2024 that you really don't need a language that will happily index outside of arrays, and how long has that been a staple in existing languages? (You may think that's silly, but we've had that fight on HN, yes, yea verily, this very year. Granted, this is winding down, but it's still a live argument.)

There's a lot of things that showed up in some language in the 1960s or 1970s and wasn't widely adopted for another few decades.

(Though I suspect one problem with Lisp is that back in the day, its performance was pretty awful relative to the other things of the day. It was not always the best advertisement for its features. A bit too ahead of its time sometimes.)

vram22 · 2024-07-17T23:51:52.000000Z

>I remember getting into programming even in the 1990s and I can tell you a lot of the C world would still consider associative data structures like a hash table something to be a bit exotic and only to be pulled out if you had no other choice. It was very easy to not be very up to date on these things.

That's a little surprising, considering that the classic first book about C, K&R was first published in 1978 and again in 1988, the first and second edition, respectively.

One of the editions, maybe the first, had a simple hash table implementation, that just summed up the ASCII values of the characters in the given string, modulo some value, and used that as the key for the string being hashed.

.

https://en.m.wikipedia.org/wiki/The_C_Programming_Language

xarope · 2024-07-18T01:51:53.000000Z

I remember being taught about associative caches in CPU architecture, and later on programming the same in C. So I'm not sure about the 90's and hash tables being "exotic".

My first real use of hashes was in perl4; with perl5 released in 1996 (https://dev.perl.org/perl5/news/), surely it meant people were also well aware of the usefulness of hash arrays even back then.

cancerhacker · 2024-07-18T05:53:26.000000Z

It was a common practical interview question in 1996 to ask a candidate to implement a hash table in C.

satiated_grue · 2024-07-24T16:57:38.000000Z

I recall implementing hash tables in COBOL on a Univac. Not even kidding a little.

kragen · 2024-07-18T06:19:25.000000Z

the picture you're painting here is absurdly far from reality

— kernighan definitely did not think hash tables were 'a bit exotic' in 01978 —

hash tables (or sometimes other associative retrieval structures) were standard fare in computer science degree programs and were a standard feature of compilers, assemblers, and linkers, of which kernighan had written several at that point; associative retrieval structures are also fundamental to databases and filesystems. in 01979, 7th edition unix (in which kernighan published awk) barely had date formatting but included an on-disk hash table library called 'dbm'

hash tables are the exclusive topic of pp. 506–549 of knuth's taocp volume 3, a standard textbook which came out in 01973. after recommending you some hash functions, on p. 512 (525/739) knuth recommends you check out a survey of hash functions published in cacm by lum, yuen, and dodd in 01971, and he cites papers on hashing in cacm as early as 01959 and in an ibm journal as early as 01957. in the 'history' section (pp. 540–542 (553–555/739)) he traces it back to an internal ibm report by luhn in 01953, and its first report in the open literature to a paper by dumey in 01956. most of the rest of the 739-page book is about how to implement associative arrays of various kinds, although some parts instead concern the string search problem, and the first half is about sorting, and a sorted table can be used for purposes other than associative retrieval

knuth also mentions a 'very influential survey' of hashing published in cacm in 01968 by robert morris, kernighan's coworker at bell labs, and one of the very first users of unix and a major contributor to it

but perhaps kernighan hadn't read cacm? on the contrary, he and his colleagues regularly published in it; he published in cacm once in 01972, once in 01973, and once in 01975, and his colleagues such as stephen johnson, mike lesk, al aho (his coauthor on awk), and, as i said above, robert morris, published there regularly as well

in the practice of programming, kernighan and pike claim that almost all of the time, the only data structures you need are arrays, linked lists, trees, and hash tables (though they wrote this in 01999)

what kernighan is talking about here is including them as a basic language construct, a design choice his 01978 tech report on awk likens to that of snobol4, which was a well-known language but not a mainstream one. in the context of the above cacm articles, you can perhaps see that the daring choice was to offer a single 'good enough' hash table implementation rather than one optimized for a given purpose—and to use it even in place of conventional arrays

— kernighan was not isolated like you were —

kernighan was at bell labs, which was probably the most connected place in the software world at the time

when awk was released in 01979 the internet was already ten years old, and at&t (the parent company of bell labs) had installed all its long-distance links

in 01976 they'd also developed their own 'internet' called uucp, which at the time of awk's release was interconnecting 80 different unix sites, interchanging software updates on a daily basis, over 300-baud and 1200-baud modems. they were also in contact with ken thompson's alma mater berkeley; thompson took a sabbatical year to teach there in 01975. berkeley got a darpa contract to add tcp/ip support to the bell labs operating system 'unix', which would become the majority of the internet during the 01980s; berkeley participated actively in the uucp network

bell labs was also working on the so-called 'advanced communication system', its own misconceived version of the internet, where data would be routed using phone numbers, because data communications had become the fastest-growing source of call traffic. (colin berkshire has written a fantastic overview of this forgotten history.)

jerf · 2024-07-18T15:24:46.000000Z

"Your comments are ridiculous! One of the smartest, most connected people in the world was smart and connected! And papers were being published all over the place!"

The computer world of the time was not clones of Kernighan a thousand times over, nor was everyone reading every single paper any more than people do today. You can't prove the normal world of computing was all aware of all these things by pointing at the very pinnacle of it. I'm talking about the real world, not the best of the best of the best.

abecedarius · 2024-07-18T21:46:55.000000Z

But, like, as a teen in the 80s by the time I learned C from Kernighan & Ritchie (not an exotic path back then), where a hashmap was one of the examples and the subject of a couple of the problems, it wasn't remotely new to me. The real world had plenty of others learning these things from Byte magazine and such even with no CS degree.

brianmurphy · 2024-07-21T23:23:03.000000Z

The big difference was in C you had to implement it yourself. Choosing a hash function and managing memory with malloc() was a hang up for a lot of people. It was quite common to say hashes and the various tree data structures weren't worth the complication if you didn't absolutely need to have a hash table or tree.

In later languages like perl, the syntax for hashes was built into the language.

That was a huge step forward.

Perl was slowly adopted for other reasons (illegible write once syntax). Other popular scripting languages like PHP adopted hashes from perl. This greatly helped make associative arrays common knowledge among the majority of programmers.

kragen · 2024-07-19T03:26:19.000000Z

also dr. dobbs. https://archive.org/details/dr_dobbs_journal_vol_03/page/n29... is a presentation of 'sam76', an 8080 reimplementation of mcilroy & morris's m6, probably built around an efficient associative array of defined macros. and assemblers built around symbol tables were commonplace at the time, too

kragen · 2024-07-19T02:50:16.000000Z

in the interview, kernighan said, 'The main idea in Awk was associative arrays, which were newish at the time, but which now show up in most languages either as library functions (hashmaps in Java or C++) or directly in the language (dictionaries in Perl and Python). Associative arrays are a very powerful construct, and can be used to simulate lots of other data structures.'

zambyte expressed surprise that kernighan referred to 'associative arrays' as 'newish' and said that lisp had had them 20 years before. what lisp had 20 years before (presumably they meant in https://www-formal.stanford.edu/jmc/recursive.pdf in 01959, not 01957, when mccarthy was trying to figure out how to add linked lists to fortran) was this line of code:

    assoc[x; y] = eq[caar[y]; x] → cadar[y]; T → assoc[x; cdr[y]]

which we can translate to python as

    assoc = lambda x, y: y[0][1] if y[0][0] == x else assoc(x, y[1:])

and which allows you to use a list of 2-tuples such as [('red', '#f00'), ('green', '#0f0'), ('white', '#fff')] as an associative array

you replied, saying, 'it was (...) easy to be (...) knowledgeable and yet never have been exposed (...) in the 1990s (...) I can tell you a lot of the C world would still consider associative data structures like a hash table something to be a bit exotic (...). It was very easy to not be very up to date on these things.'

that comment is only relevant to zambyte's comment—to which you posted it as a reply—if you had the implicit assumption that what kernighan meant by 'associative arrays' was 'associative data structures like a hash table'. but associative data structures like hash tables were not newish; hash tables in particular were 22 years old in 01979, and were already the most commonplace of all data structures except for strings and arrays. i listed 13 articles in cacm in 01975 and 01976 about associative data structures. cacm only published about 6 articles per issue (excluding the news and opinion departments), so that's roughly 10% of everything they published

you say 'you can't prove the normal world of computing was all aware of all these things', but the everyday world of computing was, if anything, even more preoccupied with associative data structures than the academic world. ibm's most lucrative product was ims, an online database, which is to say, a large on-disk associative data structure. filesystems (including dectapes), variable namespaces in interpreters and compilers, symbol tables (in assemblers, linkers, and compilers), and databases are associative arrays. even in 01979 it was already hard to find programmers who did not use any of filesystems, assemblers, interpreters, or databases (though they did exist)

so it is completely implausible that what kernighan was describing as 'newish' (in 01979) was associative data structures like a hash table. i'm not doubting you when you say you knew c programmers in the 90s who considered them 'exotic', perhaps the province of the wizards who wrote compilers and filesystems, but those c programmers were not 'the normal world of computing' or 'experienced and knowledgeable'; they were beginners

the 01978 edition of k&r presents binary search (using a sorted array as an efficient associative array) in chapter 3 on p. 54 (62/236), before presenting such advanced language features as the 'break' statement, '#define', structs, non-integer function return values, and pointers. in chapter 6 on p. 125 (133/235), they use such an associative array to count the number of times each c keyword occurs in the input. §6.5 (pp. 130–134 (138–142/235)) demonstrates a more scalable associative array using a binary search tree. §6.6 (pp. 134–136 (142–144/235)) gives another program using a hash table

so for k&r in 01978, hash tables were more basic than features such as unions, typedef, string formatting (%.10s, etc.), opening files, and listing directories, all of which are relegated to later in the book

please don't tell me 'a lot of the c world' in the 01990s hadn't heard of k&r. next you're going to be telling me a lot of the c world hadn't heard of c

getting back to the interview, what kernighan was evidently referring to was not associative array data structures in general, but having a general-purpose associative data structure library always at hand. lisp's assoc (or assq) isn't general-purpose because it's too slow for large associative data structures. lisp plists aren't general-purpose because they leak memory (you can't remove a property from all atoms, and each atom you put a property on continues to occupy memory even if you remove the property from it) and because they get too slow if you have more than a few properties. ims isn't general-purpose because you have to go talk to the database administrator to create a new file; you can't create a temporary one in ram. the example hash table in k&r isn't general-purpose because it uses a static array, a fixed size of 100 hash buckets, has no deletion, associates specifically a string value with each (string) key, and is quite inefficient; it's a reasonable starting point, but you have to tweak it for your application

java hashmaps, awk arrays, and python dicts are general-purpose as long as your data fits in ram. (you could gripe that awk arrays aren't first-class objects and can only have string and numeric keys, but that's just because awk doesn't have first-class objects at all in the lisp sense.)

precisely as a result of this, hashing (like sorting) has ceased to be a beginner-programmer topic; if you need an associative array or a sorted list, you can get quite far just by invoking the built-in facilities of your programming language. sooner or later you will probably want one optimized for some specific application, but it's no longer something you have to do all the time. but to any working programmer in the 01970s, it would have seemed absurd to propose linear searches or hash tables were 'newish'

Zambyte · 2024-07-17T20:02:45.000000Z

I totally get the first point, but this comment from Brian was made when the internet had been available for many, many years. It seems more like it felt newish at the time, and that is how he has seen it since then. I guess that's more why I find it to be an interesting comment.

Regarding the second point, when I hear the term "associative arrays" or "associative lists" I think of them in a pretty high level of abstraction. Like really: a C struct is an association abstraction. A simple array of structs is not often considered an "associative array", but it can easily be used for the same things.

Maybe it would be more accurate to say that the way they were using associative arrays way newish, rather than the construct itself.

chasil · 2024-07-17T19:16:40.000000Z

This is from the first edition free PDF online of The AWK Programming Language:

"Associative arrays were inspired by SNOBOL4 tables, although they are not as general. Awk was born on a slow machine with a small memory, and the properties of arrays were a result of that environment. Restricting subscripts to be strings is one manifestation, as is the restriction to a single dimension (even with syntactic sugar). A more general implementation would allow multi-dimensional arrays, or at least allow arrays to be array elements."

Stable release - SNOBOL4 / 1967

https://en.wikipedia.org/wiki/SNOBOL

shawn_w · 2024-07-17T23:25:30.000000Z

Common Lisp, of course, has built in hash tables. MacLisp doesn't seem to - I didn't see anything about them in the Pitmanual. ZetaLisp/LML and InterLisp have them. So it seems like built in lisp associative arrays (I'm not counting alists, or property lists, though the latter goes back to the early days and maybe should be) are roughly contemporary to the birth of awk. Maybe a bit newer.

kragen · 2024-07-17T20:31:44.000000Z

in a sense lisp alists are 'associative arrays', sure. but, although they were used in the metacircular interpreter that bootstrapped lisp and still inspires us today, they aren't a language feature, and the lisp ones aren't efficient enough for the things you use them for in awk

you can of course build an associative array in any language, especially if you're satisfied with the linear search used by lisp alists. you can hack one together in a few minutes almost without thinking:

            .intel_syntax noprefix
    lookup: push rbx
            push rbp
            mov rbx, rdi    # pointer to alist node pointer
            mov rbp, rsi    # key
        2:  mov rdi, [rbx]  # load pointer to alist node
            test rdi, rdi   # is it null?
            jnz 3f          # if not null, skip not-found case
            mov rax, rbx    # pointer to null pointer is return value
            xor rdx, rdx    # but associated value is null (sets zero flag)
            jmp 4f          # jump to shared epilogue
        3:  mov rdi, [rdi]  # load pointer from key field of alist node
            mov rsi, rbp    # reload key from callee-saved register
            call comkey     # sets zero flag if rsi and rdi are equal keys
            jne 1f          # skip following return-value case code if !=
            mov rax, rbx    # pointer to node pointer is return value
            mov rdx, [rbx]  # also let’s follow that pointer to the node
            mov rdx, [rdx + 16]    # and return its value field too in rdx
            test rax, rax   # clear zero flag (valid pointer is not null)
        4:  pop rbp
            pop rbx
            ret
        1:  mov rbx, [rbx]  # load pointer to alist node again
            add rbx, 8      # load pointer to next-node pointer field
            jmp 2b

(untested, let me know if you find bugs)

but that's very different from having them built in as a language feature, like snobol4, mumps, awk, perl, python, tcl, lua, and js do. the built-in language feature lisp had for structuring data 20 years earlier was cons, car, and cdr, which is not at all the same thing

other programs on unix that contained associative arrays prior to awk included the linker (for symbols), the c compiler, the assembler, the kernel (in the form of directories in the filesystem), and the shell, for shell variables. none of these had associative arrays as a programming language feature, though. the bourne shell came closest in that you could concatenate variable names and say things like

     eval myarray_$key=$val

here's the implementation of binary tree traversal for setting shell variables in the v7 bourne shell from 01979

    NAMPTR          lookup(nam)
            REG STRING      nam;
    {
            REG NAMPTR      nscan=namep;
            REG NAMPTR      *prev;
            INT             LR;

            IF !chkid(nam)
            THEN    failed(nam,notid);
            FI
            WHILE nscan
            DO      IF (LR=cf(nam,nscan->namid))==0
                    THEN    return(nscan);
                    ELIF LR<0
                    THEN    prev = &(nscan->namlft);
                    ELSE    prev = &(nscan->namrgt);
                    FI
                    nscan = *prev;
            OD

            /* add name node */
            nscan=alloc(sizeof *nscan);
            nscan->namlft=nscan->namrgt=NIL;
            nscan->namid=make(nam);
            nscan->namval=0; nscan->namflg=N_DEFAULT; nscan->namenv=0;
            return(*prev = nscan);
    }

if you're not sure, that's c (now you know where the ioccc came from)

foobarian · 2024-07-17T21:25:33.000000Z

I love that Bourne abused C macros to make sh source look like Algol. Lots of references to this elsewhere but this is my favorite:

https://www.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/cmd/sh...

kazinator · 2024-07-18T03:58:53.000000Z

The "killer feature" of alists is that they are persistent data structures; when we extend a shorter alist to make a longer one, the shorter one is unaffected. Moreover, if the program has only a reference to the tail portion of a longer alist, the head portion can be garbage-collected.

With this, we can correctly implement lexical closures: we simply take the current environment (head of the alist) ssociate it with the lambda body and arguments, and we are done.

kragen · 2024-07-18T04:15:30.000000Z

yes! also when you only have two or three things in them, they are faster than hashing, especially on machines without a data cache. clojure uses hamts for this, thus getting this fp-persistence† as well as efficient lookup in large dictionaries, and chris wellons posted a concise, fast c hamt implementation last year at https://nullprogram.com/blog/2023/09/30/

______

† 'persistent' unfortunately also means 'stored in nonvolatile memory', so i feel the need to disambiguate

chuckadams · 2024-07-18T21:00:18.000000Z

I’d say plists have a stronger claim on being “real” associative arrays. Lisp never made much distinction between “built in” and “library” to begin with, you're using the same syntax to use the feature either way.

kragen · 2024-07-19T03:52:52.000000Z

i think it's true that, much of the time that you'd use a dict in python, you'd use a property in 01960s lisp, because it let you take advantage of the read-time implicit hashing of symbols; you probably only had one or two properties on most of your symbols, so getting from the interned symbol to the property was a matter of a handful of instructions---probably faster than a hash function. but conceptually i feel like they're pretty different ideas

JonChesterfield · 2024-07-17T19:45:52.000000Z

I'm very taken with the regex to lex to yacc to awk sequence of developments. There's a very convincing sense of building on prior work to achieve more general results.

ldjkfkdsjnv · 2024-07-18T00:18:30.000000Z

People will hate me for saying this, but I bet there are many many repos on github written by programmers that are better than all these coding legends. Back then, it was a much smaller pool.

azemetre · 2024-07-18T00:20:33.000000Z

Anyone in particular you would you consider great? Always neat to follow the careers or happenings of the new blood.

funemployd · 2024-07-18T02:02:40.000000Z

I don't think anyone ever claimed Brian Kernighan was the best programmer ever.

rubiquity · 2024-07-18T16:22:47.000000Z

Your comment is naive but I’m dying to know: how would you measure better?

syngrog66 · 2024-07-18T14:49:24.000000Z

Awk Creator: funny!

dfedbeef · 2024-07-18T16:30:32.000000Z

Very brief

kansai · 2024-07-17T18:48:25.000000Z

Describing the K of K&R as "Awk creator" is like describing Einstein as a "refrigerator engineer".

overhead4075 · 2024-07-17T18:54:43.000000Z

It's an awk interview. "Awk creator" is relevant.

felideon · 2024-07-17T20:03:17.000000Z

Right, the headline gave the appropriate context for the interview in a short sentence. Not everyone knows Kernighan created awk, as opposed to a respected person that happens to have opinions on awk.

Gormo · 2024-07-17T20:10:41.000000Z

AWK's name is actually an initialism of the last names of its three creators: Aho, Weinberger, and Kernighan.

usr1106 · 2024-07-17T20:56:46.000000Z

Aho, one of the authors of the dragon books. When I studied CS 40 years ago it was given standard literature on compiler construction, a mandatory course.

No idea whether it is still used today. Well, no idea whether there is anything fundamentally new in compilers such an old book would not cover.

ryangs · 2024-07-17T21:19:20.000000Z

Indeed, still in use. Also presumably the Aho of the Aho-Corasick string matching algorithm.

funemployd · 2024-07-18T02:03:43.000000Z

There are plenty of advanced compiler techniques, but you can't understand them until you understand the fundamentals. There will always be a place for the purple dragon book.

usr1106 · 2024-07-18T06:41:07.000000Z

Ah, it's purple now. It was green when I studied, became red before I graduated. When I search for it now the thumbnails look unfamiliar.

riiii · 2024-07-17T19:44:18.000000Z

He didn't say it was irrelevant.

midiguy · 2024-07-17T19:09:23.000000Z

I agree it was an incredibly awkward interview

JohnFen · 2024-07-17T20:36:48.000000Z

Why do you say that? developing Awk was no less of an accomplishment than C.

alexthehurst · 2024-07-17T21:21:15.000000Z

This seems like a rather large claim that would benefit from some kind of supporting argument.

kragen · 2024-07-17T21:29:26.000000Z

c-descended languages include c++, java, and c#

awk-descended languages include perl, tcl, js, python and lua

it's a close race but if i had to pick i'd say the second group is more influential

shawn_w · 2024-07-17T23:30:51.000000Z

Most implementations of that second group of languages are written in C, and run on operating systems written in C. It's really no contest that C has had far more of an influence on computing. So much software would not exist, or not in the forms we recognize, without it.

kragen · 2024-07-18T00:11:15.000000Z

or at least in c++, c#, or java, sure, smalltalk being the main exception

but the layers are somewhat orthogonal. at the verilog level you can't really tell if your hardware is built out of luts, cmos standard cells, hand-laid-out nmos, ttl, or vacuum tubes; at the operating system level you can't really tell if your cpu was designed in vhdl, verilog, chisel, or hand-drawn schematics; at the interpreter level you can't really tell if your operating system was written in c, pl/x, assembly, or lisp; and at the python level you can't really tell if you're running on cpython (which is written in c) or pypy (which is written in python)

i mean there are small clues that make it hard to completely hide, but they're subtle

so, to my way of thinking, saying 'c has had far more of an influence on computing than awk', on the basis that awk was written in c, is similar to saying 'transistors have had far more of an influence on computing than c'. it's sort of true, but it's also sort of false

esafak · 2024-07-18T00:53:58.000000Z

Python influenced by AWK? Never heard that before, and Wikipedia does not list awk among python's numerous influences, either. https://en.wikipedia.org/wiki/Python_(programming_language)

Qem · 2024-07-18T02:18:58.000000Z

https://x.com/gvanrossum/status/1305529996383928320

kragen · 2024-07-18T03:22:11.000000Z

this says 'abc, shell, awk', but what is guido responding to?

Qem · 2024-07-18T09:13:25.000000Z

Not sure, but by the context of the conversation, it seems he complained people keeps saying lisp was an inspiration for Python, when in fact he was not even thinking about it by the time Python was created. Then it appears someone asked him what he actually based on then, and he answered abc, shell, awk. But given the lost message in the thread, perhaps you can ask him for clarification, to set the record straight.

kragen · 2024-07-18T10:18:15.000000Z

i see, thanks!

yeah, i think it's well-known that lambda, map, filter, and reduce (the lispy bits of python) weren't in the earliest versions of python; they were contributed later on by a lisp fan

at a larger remove, of course, lisp was an inspiration for abc (via lcf and setl), shell (via algol), and awk (in a smaller way)

fuzztester · 2024-07-18T17:02:33.000000Z

I may be wrong, but I think apply() was there in an early version of python, maybe 1.5.

kragen · 2024-07-19T00:05:55.000000Z

1.5 was several years later

fuzztester · 2024-07-19T00:40:35.000000Z

but it was also several years ago, which is when I first started playing around with python, before starting to do production work with it a few years later.

so in one sense, i.e. compared to the present, 1.5 is an early version.

https://en.m.wikipedia.org/wiki/History_of_Python

kragen · 2024-07-19T23:52:23.000000Z

all true

kragen · 2024-07-18T03:17:13.000000Z

kernighan said in the interview

> The main idea in Awk was associative arrays, which were newish at the time, but which now show up in most languages either as library functions (hashmaps in Java or C++) or directly in the language (dictionaries in Perl and Python). Associative arrays are a very powerful construct, and can be used to simulate lots of other data structures.

awk was released in 01979, and the authors published this paper in sp&e that year: https://plan9.io/sources/contrib/steve/other-docs/awk.pdf but you see this report version is dated september 01978. but i don't think the report was widely circulated until the next year, when it was included in 7th edition unix as /usr/doc/awk (sudo apt install groff; groff -ms -Tutf8 v7/usr/doc/awk | less -r). it explains:

> Array elements may be named by non-numeric values, which gives awk a capability rather like the associative memory of Snobol tables. (...) There is an alternate form of the for statement which is suited for accessing the elements of an associative array:

this pdf has evidently been retypeset from the troff sources from the open-source 7th edition release, but without the correct bibliographic database, so the references are missing. a comment in the troff source says:

> ....It supersedes TM-77-1271-5, dated September 8, 1977.

but possibly that reference is inaccurate

python goes beyond merely having dicts 'directly in the language'. python's primary data structure is the dict; among other things, it uses dicts for modules, (most) class instances, associating methods with classes, the locals() user interface to the local-variable namespace, and passing keyword arguments to functions and methods. that is, it uses associative arrays to simulate lots of other data structures, as you are obliged to do in awk, lua, and js. so where did python get dicts?

python got dicts (and tuples) from abc, a teaching language which wikipedia claims was started in 01987, 8 years after awk's release, and added conventional arrays (lists) back in. the five data types in abc https://homepages.cwi.nl/~steven/abc/language.html are numbers, strings, compounds (called tuples in ml), lists (really multisets because they're implicitly sorted), and tables (dictionaries), which are awk's 'associative arrays'—and, as in awk, js, lua, and tcl, they're used to provide the functionality of conventional arrays as well

however, lambert meertens credits the use of tables in abc to jack schwartz's setl https://inference-review.com/article/the-origins-of-python rather than to awk. he says of the addition of tables to b (the early name for abc, not to be confused with the b that was an earlier version of c)

> Having coded a few algorithms in SETL, I had experienced its power firsthand—a power that stemmed entirely from its high-level inbuilt data types. Particularly powerful were sets and maps, also known as “associative arrays,” containing data that can be indexed not only by consecutive integers but by arbitrary values. A programmer could introduce a simple database of quotations named whosaid, in which the value ”Descartes” could be stored in the location whosaid[”cogito ergo sum”]. These high-level types made it possible to express algorithms that required many steps in B1 using just a few steps in SETL. In a clear violation of the Fair-Expectation Rule, B1 allowed only integers as array indices. This design decision had been driven by fear: we had been concerned that aiming too high would make our language unimplementable on the small personal computers that were starting to appear on the market. But Dewar, in particular, convinced me that this meant we were designing for the past, not the future. This led us to redesign the system of data types for our beginners’ language. This time we used only the criteria of ease of learning and ease of use to select candidate systems. The winner turned out to be remarkably similar to the data type system of SETL. The set of possible data type systems to choose from was very large, and to make the process more manageable I had written a program to select the competitive (Pareto-optimal) candidate systems. Interestingly, but quite incidentally, that selection program itself was written in SETL. The winning type system became that of B2, and remained unchanged in the final iteration, released in 1985 under the name “ABC.”

'associative arrays', of course, is the term used by awk

this story of adding associative arrays to abc only for b2 is somewhat complicated by the fact that the version of b (b1?) in meertens's 01981 'draft proposal for the b programming language' https://ir.cwi.nl/pub/16732 already includes tables, three years after the release of awk as part of 7th edition; p. 6 (9/91) says,

> Tables are somewhat like dictionaries. A short English-Dutch dictionary (not sufficient to maintain a conversation) might be (...) Table entries, like entries in a dictionary, consist of two parts. The first part is called the key , and the second part is called the associate. All keys must be the same type of value, and similarly for associates. A table may be written thus: {[’I’]: 1; [’V’]: 5; [’X’]: 10}.

> If this table has been put in a target roman, then roman[’X’] = 10.

note that this is also awk's syntax for indexing an associative array, though it doesn't have a syntax for writing one down.

(to be continued; hn says, 'that comment included too many facts')

carapace · 2024-07-18T06:03:09.000000Z

> The set of possible data type systems to choose from was very large, and to make the process more manageable I had written a program to select the competitive (Pareto-optimal) candidate systems. Interestingly, but quite incidentally, that selection program itself was written in SETL.

Wow, now that I'd like to see.

Awesome and informative comments, cheers!

fuzztester · 2024-07-18T17:08:21.000000Z

I can't even make a guess as to how such a program could be written.

anyone has an idea?

kragen · 2024-07-19T03:39:24.000000Z

it sounds a little bogus to me, but i can outline some ways to approach it

if you have a set of features, generating its powerset is fairly simple

    >>> powerset = lambda xs: [x0 + others for others in powerset(xs[1:]) for x0 in [[], [xs[0]]]] if xs else [[]]
    >>> powerset(['int', 'float', 'set', 'array'])
    [[], ['int'], ['float'], ['int', 'float'], ['set'], ['int', 'set'], ['float', 'set'], ['int', 'float', 'set'], ['array'], ['int', 'array'], ['float', 'array'], ['int', 'float', 'array'], ['set', 'array'], ['int', 'set', 'array'], ['float', 'set', 'array'], ['int', 'float', 'set', 'array']]

then you just need some way to calculate various merits of the different designs. one merit is simplicity, which you could maybe try to operationalize as something like this:

    >>> [(d, {'simplicity': 10 - len(d)}) for d in powerset(['int', 'set', 'array'])]
    [([], {'simplicity': 10}), (['int'], {'simplicity': 9}), (['set'], {'simplicity': 9}), (['int', 'set'], {'simplicity': 8}), (['array'], {'simplicity': 9}), (['int', 'array'], {'simplicity': 8}), (['set', 'array'], {'simplicity': 8}), (['int', 'set', 'array'], {'simplicity': 7})]

comparing two merit-maps such as {'simplicity': 8} and {'simplicity': 6} for pareto optimality is simple enough with some thought, especially if we assume the keys are the same:

    >>> some_way_better = lambda a, b: any(a[k] > b[k] for k in a)
    >>> defeats = lambda a, b: some_way_better(a, b) and not some_way_better(b, a)
    >>> defeats({'simplicity': 9}, {'simplicity': 8})
    True
    >>> defeats({'simplicity': 8}, {'simplicity': 9})
    False
    >>> defeats({'simplicity': 9, 'turing-complete': 0}, {'simplicity': 8, 'turing-complete': 1})
    False
    >>> defeats({'simplicity': 9, 'turing-complete': 1}, {'simplicity': 8, 'turing-complete': 1})
    True

then it's easy enough to calculate which of a set of candidate designs can be eliminated because some other design defeats them

the bogus part is how you automatically calculate the various merits of a hypothetical collection of language features

carapace · 2024-07-19T04:21:37.000000Z

Yeah, what was the representation of these "data type systems" and what were the metrics I wonder.

kragen · 2024-07-18T06:20:14.000000Z

i'm glad they were helpful!

kragen · 2024-07-18T03:17:30.000000Z

a more recent set of slides on the relation between abc and python is https://www.cwi.nl/documents/195216/Meertens-20191121.pdf which describes again how abc was started in 01975. this helpfully clarifies the timeline: b0 was 01975; b1 was 01978; b2 was 01979; and b∞ = abc was 01985. so specifically the point at which setl inspired the replacement of conventional arrays in b1 with associative arrays in b2 was 01979, which was the year 7th edition unix was released and the aho, weinberger, and kernighan paper was published in sp&e

a question of some interest to me here is what platform they were developing abc on in 01979. clearly it couldn't have been the ibm pc, which wouldn't come out until 01983 (and as far as i know abc on the ibm pc only runs under cygwin or 32-bit microsoft windows), or macos (which came out in 01984) or atari tos, which wouldn't come out until 01985. and so far i haven't seen any mention in the history of abc of other operating systems of the time like cp/m, vm/cms, dg rdos, tenex, or tops-20. the most likely platform would seem to have been unix, on which awk was one of the relatively few programming languages available. perhaps at some point i'll run across an answer to that question in the abc papers

python adopted awk's syntax for putting 10 into roman['x'], which was `put 10 in roman['x']` in abc, but `roman['x'] = 10` in awk and python. abc's syntax is uppercase, presumably case-insensitive, separates words with apostrophes, and departs widely from conventional infix syntax. python's syntax is case-sensitive, mostly lowercase, and conventionally infix, features that have become common through the influence of unix. python's control structures are for, while, and if/elif/else, as in algol and in abc, and indentation-sensitive as in abc, but uses a conventional ascii syntax rather than abc's scratch-like syntax-directed editor

abc was statically typed with a hindley-milner type system ('the type system is similar to that of lcf', p. 15 (18/91) of the draft proposal), while python is dynamically typed, like smalltalk, lisp, and awk

if meertens got his daring notion of storing everything in associative arrays from awk, he certainly doesn't mention it. instead he mentions setl a lot! the draft proposal doesn't cite awk but it also doesn't cite setl; it cites the algol-68 report, milner's lcf typing paper, a cleaveland and uzgalis paper about grammars, gehani, and three of his own papers, from 01976, 01978, and 01981. unfortunately i can't find any of those earlier meertens papers online

the wikipedia page about setl says

> SETL provides two basic aggregate data types: (unordered) sets, and tuples.[1][2][5] The elements of sets and tuples can be of any arbitrary type, including sets and tuples themselves, except the undefined value om[1] (sometimes capitalized: OM).[6] Maps are provided as sets of pairs (i.e., tuples of length 2) and can have arbitrary domain and range types.[1][5]

but it's citing papers about setl from 01985 there, well after awk had supposedly popularized the notion of associative arrays

however, in meertens's essay on python's history, he cites a 01975 paper on setl! https://www.softwarepreservation.org/projects/SETL/setl/doc/...

> Jacob T. Schwartz. ON PROGRAMMING: An Interim Report on the SETL Project. Part I: Generalities; Part II: The SETL Language and Examples of Its Use. Computer Science Department, Courant Institute of Mathematical Sciences, New York University, revised June 1975.

this discusses how setl represented data in memory starting on p. 57 (57/689). it used hash tables to represent sets, including sets of tuples, rather than the ill-advised balanced-tree approach used by abc. (python, like awk and setl, uses hash tables.) on pp. 62–63 (62–63/689) it explains:

> The hash code of a tuple is taken to be the hash code of its first component, for reasons that will become clear in the next section. The hash code of a set is the exclusive or of the hash codes of all its members. (...)

> — Tuples in Sets —

> Though expressible in terms of the membership test, "with", and "less" operations, functional evaluation plays so important a role in SETL algorithms that we treat it as a primitive.

> SETL makes three types of set-related functional evaluation operators available:

> - f(x)

> - f{x}

> - f[s]

> The most fundamental of these is f{x}, which invokes a search over f for all n-tuples that begin with x (n ≥ 2), and which yields as result the set of all tails of these n-tuples. More precisely, in SETL:

> f{x} = if #y eq 2 then y(2) else tℓ y, y ∈ f | type y eq tupl and #y ge 2 and hd y eq x}

> The operation f(x) has a similar definition but includes a single valuedness check:

> f(x) = if #f{x} eq 1 then ∋f{x} else Ω

> The operation f[s] is adequately defined in terms of f{x}:

> f[s] = [+: x ∈ s] f{x}

i am fairly confident that the f{x} definition translates into current vernacular python as the set-comprehension {y[2] if len(y) == 2 else y[1:] for y in f if type(y) == tuple and len(y) >= 2 and y[0] == x}.

so, it becomes clear that already in 01975 setl treated sets of tuples as maps, which is to say associative arrays, but it didn't use the 'associative array' terminology used by meertens in 01981, or for that matter 'maps'. to look up an element in the map, it didn't use the f[x] notation used by python, awk, and abc; instead it used f(x). further explanation on pp. 64–65 (64–65/689) clarifies that really it is more accurate to think of 'sets of tuples' as trees; each item in the tuple entails following an additional level of pointers to a further hash table

(in a number of other notational details, python and presumably abc follows setl: start: or start:end for array index ranges, + for string concatenation, * for string repetition, boolean operators spelled out as and, or, and not. but indexing maps is far from the only difference)

abc (including b as described in the 01981 report) also seems to lack the f{x} operation and its possibility of associating an arbitrary-sized set of values with each key. this is a nontrivial semantic divergence

so if abc got its idea of tables from setl, but used awk's terminology, notation, and semantics for them (and its own ill-conceived balanced-tree implementation, used by neither), and decided to adopt the table idea in the year when awk was released, probably on the platform that awk was released on, i think it's reasonable to assign some share of the credit for abc's tables to awk? even if not all of it

but if that's so, then why didn't meertens credit aho, weinberger, and kernighan? i don't know. maybe awk's loosey-goosey nature was repugnant to him. maybe weinberger is jewish and meertens is secretely anti-semitic. maybe meertens thought that awk's loosey-goosey nature would be repugnant to the dijkstra-honoring dutch computer science establishment. maybe aho insulted meertens's favorite beer one time when he visited the netherlands. or maybe he thought it would be unfair for aho, weinberger, and kernighan to steal the thunder of schwartz, who did after all precede them in developing a general-purpose language whose memory was based entirely on hash tables. from a certain point of view that would be like crediting carl sagan with the theory of relativity because he explained it on nova

audleman · 2024-07-17T20:48:30.000000Z

I read this comment in the spirit of "you might not have known that K did much more that write awk, but he's a genius." Which I didn't know, so I appreciate the context.

1234554321a · 2024-07-17T20:37:31.000000Z

He’s the closest man to C still alive. Yet it’s not really “his” accomplishment. C was created by Ritchie alone, Kernighan only wrote the book.

aap_ · 2024-07-17T20:48:04.000000Z

I would say Ken Thompson and Steve Johnson are the closer.

fuzzy_biscuit · 2024-07-17T18:47:33.000000Z

I know the title said brief, but it still took me by surprise that there were only three questions.

phatfish · 2024-07-17T21:14:07.000000Z

No excuses not to read the whole article before commenting this time!

dheera · 2024-07-17T20:06:55.000000Z

[flagged]

maksum · 2024-07-17T20:09:23.000000Z

I thought I could escape this meme on hn, but alas

Jtsummers · 2024-07-17T20:11:49.000000Z

Just downvote or flag. It looks like you only have 17 karma so I don't think you can do either (I know you can't downvote, I'm not sure what, if any, threshold there is for flagging), so the best thing to do is to ignore it. Contribute to the site until you get enough karma, and then downvote/flag this kind of nonsense.