I love the brevity of regular expressions and use them on a daily basis. It is the same argument that keeps me returning to K: the syntax is terse and compact, the semantics are simple and composable, and your eyes get used to it.
Beyond a point however, I cannot read my own regex's after a month's absence. Which is why I use perl's /x modifier extensively to split up regex components onto multiple lines and to document them thoroughly, even if they are for throwaway scripts, because I don't always throw them away!.
For example:
$_ =~ m/^ # anchor at beginning of line
The\ quick\ (\w+)\ fox # fox adjective
\ (\w+)\ over # fox action verb
\ the\ (\w+) dog # dog adjective
(?: # whitespace-trimmed comment:
\s* \# \s* # whitespace and comment token
(.*?) # captured comment text; non-greedy!
\s* # any trailing whitespace
)? # this is all optional
$ # end of line anchor
/x; # allow whitespace
This is where K fails me. It may not be a fault of the language, but everyone in the community has bought into this strange idiomatic style. I can't imagine debugging it, or checking it for correctness, or foisting it on a less experienced developer. Here's a canonical example an xml parser, on their website.
Most of that document _is_ comments. There's a comment on almost every line, very similar to your perl example. Comments begin with a "/" character which doesn't have a function to the left (e.g. whitespace).
First we have some constants (L,W,B,S,R) which refer to the left-bracket, whitespace (which includes blank), blank space, and slash and right-bracket. We've also got some utility functions (cut;join). These are simple enough they don't require any special explanation to the K programmer who reads this.
Then we have a function that produces an xml-entity from a character. The author assumes octal is required, so (needlessly) converts to that. The octal string (with a leading zero) is concatenated onto ";&#" then rotated so the ";" appears at the end (1! is cute). I would probably write this differently, because: 1!";&#",$_ic is shorter.
We then have a function that does the reverse, cutting off the first three characters after rotating (which is the ";&#" string again) and converts the octal digits back into decimal. This is probably wrong because real XML documents will probably prefer decimal entities, but perhaps the author wasn't dealing with these. I would certainly write this differently if I changed oc (as above).
Now we have the helper function xc and cx (whose names suggest they are converting from character-to-xml and xml-to-character respectively). This is a stylistic observation, we can also see this from the comment, or by reading the code (if we know what XML is). These implementations are pretty basic, just using ssr to do repeated search/replace on the entities (note that ssr knows that ? character means any).
You get used to it.
> Why is this line noise considered acceptable?
One major challenge reading inscrutable perl scripts is knowing where the execution begins. Perl just has so many rules for parsing it you really need either wizardry or patience to know how to pull it apart, but K is extremely regular: there's only one way to parse it, and shortly after learning it you also learn (quickly) you can insert trace statements that don't change the meaning of the rest of the statement to learn a new operator (or a new use of an operator you didn't know). I note this especially as it is extremely hard to do in perl (and even other Iverson languages including APL and J).
Line noise is a subjective quality that goes away (at least in this case) when you become more fluent in K. I don't believe this is necessarily true of all compact languages though.
If the comments were good (more like you wrote) then the ratio of comments to text would be even higher. And as with writing assembly, the risk of comments getting out of sync with the code is higher, too.
Thank you. The narration is what happens in my brain when I read it. I don't need it in the source files. Keeping the file short is the best way to keep it consistent (what you refer to "getting out of sync")
This seems similar to how Unix commands have both short and long names for flags. Single-character flags are easier to type, but also easier to mistype or misread since there is less redundancy.
It seems like K would be a particularly suitable language for having more than one syntax. The short syntax, once you get used to it, would be better for keyboard input and expert whiteboard discussions, but it might also be nice if there were also a standard syntax that was longer and closer to what most people expect? An editor could automatically translate between short and long syntax, and this would be helpful for making sure you typed what you think you did.
I think that's part of the theory behind q, which trades the monadic (unary) definition of operators for names, so: +: becomes flip, =: becomes group; ?: becomes distinct, and so on. I'm not convinced though, because Python+numpy has most of these operations (and those it doesn't aren't particularly difficult to implement), so it seems reasonable you could implement an environment almost as good as q[1].
But whilst the k/q operators are certainly useful, the Key thing is the notation. The notation is really valuable, and it seems hard to get it until you understand the notation well enough that it starts changing how you think about programming: numpy.matmul(x,y) might do the same thing that x+.y does, but the latter suggests more. I recommend reading Iverson's paper[2] on the subject, although you might find reading §5.2 before the beginning to be helpful in putting into context what exactly is meant by notation here.
[1]: There's a lot missing still. Good tables, efficient on-disk representation of data, IPC, views, and others-- all of which will be hard to do in Python without limiting yourself to a subset of Python that might not feel like Python anymore anyway.
I've read a lot of human oriented, commented code which meant nothing to me, because the overall state space / architecture was fuzzy.
Whether it's a oneliner or a framework, readability is quite secondary IMO
ps: this also connect to the mathematician views about naming vs structure.. names are mostly arbitrary, it's the structure that drives the logic and the device.
While I can understand how that code can be intimidating to a programmer with a more “traditional” background… there are 26 (non-empty) lines of comments, and only 15 lines of code.
10 of those 26 are "exercise", not comment. The 6 lines above that are ununderstandable, and I can only find two others, which fail to explain what it's supposed to do.
In this context, "intimidating" and "less tradational" are euphemisms that stem from cognitive dissonance reduction.
Beyond a point however, I cannot read my own regex's after a month's absence. Which is why I use perl's /x modifier extensively to split up regex components onto multiple lines and to document them thoroughly, even if they are for throwaway scripts, because I don't always throw them away!.
For example:
(source: https://www.perl.com/pub/2004/01/16/regexps.html/)This is where K fails me. It may not be a fault of the language, but everyone in the community has bought into this strange idiomatic style. I can't imagine debugging it, or checking it for correctness, or foisting it on a less experienced developer. Here's a canonical example an xml parser, on their website.
https://a.kx.com/a/k/examples/xml.k
Where's the pedagogy? Where are the comments? Why is this line noise considered acceptable?