I love the brevity of regular expressions and use them on a daily basis. It is t...

geocar · on March 9, 2020

> Where's the pedagogy? Where are the comments?

Most of that document _is_ comments. There's a comment on almost every line, very similar to your perl example. Comments begin with a "/" character which doesn't have a function to the left (e.g. whitespace).

First we have some constants (L,W,B,S,R) which refer to the left-bracket, whitespace (which includes blank), blank space, and slash and right-bracket. We've also got some utility functions (cut;join). These are simple enough they don't require any special explanation to the K programmer who reads this.

Then we have a function that produces an xml-entity from a character. The author assumes octal is required, so (needlessly) converts to that. The octal string (with a leading zero) is concatenated onto ";&#" then rotated so the ";" appears at the end (1! is cute). I would probably write this differently, because: 1!";&#",$_ic is shorter.

We then have a function that does the reverse, cutting off the first three characters after rotating (which is the ";&#" string again) and converts the octal digits back into decimal. This is probably wrong because real XML documents will probably prefer decimal entities, but perhaps the author wasn't dealing with these. I would certainly write this differently if I changed oc (as above).

Now we have the helper function xc and cx (whose names suggest they are converting from character-to-xml and xml-to-character respectively). This is a stylistic observation, we can also see this from the comment, or by reading the code (if we know what XML is). These implementations are pretty basic, just using ssr to do repeated search/replace on the entities (note that ssr knows that ? character means any).

You get used to it.

> Why is this line noise considered acceptable?

One major challenge reading inscrutable perl scripts is knowing where the execution begins. Perl just has so many rules for parsing it you really need either wizardry or patience to know how to pull it apart, but K is extremely regular: there's only one way to parse it, and shortly after learning it you also learn (quickly) you can insert trace statements that don't change the meaning of the rest of the statement to learn a new operator (or a new use of an operator you didn't know). I note this especially as it is extremely hard to do in perl (and even other Iverson languages including APL and J).

Line noise is a subjective quality that goes away (at least in this case) when you become more fluent in K. I don't believe this is necessarily true of all compact languages though.

skybrian · on March 9, 2020

If the comments were good (more like you wrote) then the ratio of comments to text would be even higher. And as with writing assembly, the risk of comments getting out of sync with the code is higher, too.

geocar · on March 9, 2020

Thank you. The narration is what happens in my brain when I read it. I don't need it in the source files. Keeping the file short is the best way to keep it consistent (what you refer to "getting out of sync")

skybrian · on March 9, 2020

This seems similar to how Unix commands have both short and long names for flags. Single-character flags are easier to type, but also easier to mistype or misread since there is less redundancy.

It seems like K would be a particularly suitable language for having more than one syntax. The short syntax, once you get used to it, would be better for keyboard input and expert whiteboard discussions, but it might also be nice if there were also a standard syntax that was longer and closer to what most people expect? An editor could automatically translate between short and long syntax, and this would be helpful for making sure you typed what you think you did.

geocar · on March 10, 2020

I think that's part of the theory behind q, which trades the monadic (unary) definition of operators for names, so: +: becomes flip, =: becomes group; ?: becomes distinct, and so on. I'm not convinced though, because Python+numpy has most of these operations (and those it doesn't aren't particularly difficult to implement), so it seems reasonable you could implement an environment almost as good as q[1].

But whilst the k/q operators are certainly useful, the Key thing is the notation. The notation is really valuable, and it seems hard to get it until you understand the notation well enough that it starts changing how you think about programming: numpy.matmul(x,y) might do the same thing that x+.y does, but the latter suggests more. I recommend reading Iverson's paper[2] on the subject, although you might find reading §5.2 before the beginning to be helpful in putting into context what exactly is meant by notation here.

[1]: There's a lot missing still. Good tables, efficient on-disk representation of data, IPC, views, and others-- all of which will be hard to do in Python without limiting yourself to a subset of Python that might not feel like Python anymore anyway.

[2]: http://www.eecg.toronto.edu/~jzhu/csc326/readings/iverson.pd...

agumonkey · on March 9, 2020

Slight reversal.

I've read a lot of human oriented, commented code which meant nothing to me, because the overall state space / architecture was fuzzy.

Whether it's a oneliner or a framework, readability is quite secondary IMO

ps: this also connect to the mathematician views about naming vs structure.. names are mostly arbitrary, it's the structure that drives the logic and the device.

nomel · on March 9, 2020

For regex, named capture groups can also be used so one group per line isn't as required:

    $_ =~ m/^                                     # anchor at beginning of line
            The\ quick\ (?<adjective>\w+)\ fox    # fox adjective
            \ (?<verb>\w+)\ over                  # fox action verb
            ...

lelf · on March 9, 2020

> Where are the comments?

While I can understand how that code can be intimidating to a programmer with a more “traditional” background… there are 26 (non-empty) lines of comments, and only 15 lines of code.

tgv · on March 9, 2020

10 of those 26 are "exercise", not comment. The 6 lines above that are ununderstandable, and I can only find two others, which fail to explain what it's supposed to do.

In this context, "intimidating" and "less tradational" are euphemisms that stem from cognitive dissonance reduction.

dcsan · on March 9, 2020

lots of comments there!

/ xml from char

agumonkey · on March 9, 2020

I thought that was a game of life