J and K remind me of the essay about the Lisp Curse[0], which mentions that the expressiveness of the language became a sort of Achilles heel in its culture. It talks about people writing their projects in Lisp and not expecting other people to adapt to their conventions or combine their efforts on one library, where every solution worked well enough initially - but only for one person, its author.
In K the error messages are far harder to understand than most languages, and you'd need either enough comments or many hours of experience with the language to understand what a piece of code does, due to its sheer expressiveness. Also, there's comparatively less documentation for when you're trying to do something practical. I can't imagine introducing J or K in an open source project will be helpful if other contributors have to catch up to my understanding of the language and need to internalize and remember which single-character symbols mean "print" or "iter." I imagine that at least a Python programmer could reasonably understand the gist of a similar Ruby program, merely due to the fact that "class" and "map" and "length" are sequences of characters that usually show up in both languages. Not so with K.
Still, I find the vastly different ways of doing things with K interesting to see, even if I will only use it personally.
That is fine. Most people simply cannot drive a Formula-1 car, but it would be stupid to conclude that a F-1 is a cursed car and nobody should want to drive one. Similarly, a few languages can only be used by highly trained people who understand well how it works. Lisp, Forth, and J are in that category of tools that require highly trained people. In their hand, one of these languages can do fantastic things, in the hands of an average programmer you have nothing.
Lisp without macros can be and is used by 'normal programmers'. And even with macros it's not like it's witchcraft that only a select few can handle. It's a powerful language, but nonetheless a language that everyone can learn.
That is actually a nice analogy! No one, not even the pro formula-1 drivers will want to drive a f1 car in regular traffic. Much the same with these languages (apls, lisps, forth, etc); they have their place and role, but is better not used in regular open source or commercial codes.
Actually, it's somewhat the opposite. The lisp curse refers to the ways in which lisp can easily be extended with nonstandard or bespoke features. Apl relies much more heavily on primitives; they are so expressive and general that you often don't even need to add functions. This gives the code more universality (assuming you know the language, which I expect should be a given).
It's about redundancy. Code in other languages has much more of it, and that is good for understandability, recognizability.
With less redundancy it is easier to make errors both in writing AND reading of code.
APL is like compressed code. The redundancy has been compressed out. Therefore reading a page of APL is like reading a book in other languages. Even though the text is shorter you still have to understand as much of program behavior.
Reading APL means reading something FAST and we know that it is easy to have errors whenever you try to do something fast, say like typing very fast.
> With less redundancy it is easier to make errors both in writing AND reading of code.
I don't really see that. Boilerplate code is not helpful for clarity or correctness. It's just more code that you have to write that can introduce errors.
The errors you introduce into boilerplate code are trivial syntax errors easy to spot by the parser. The difficult errors are the semantic ones because the parser can not catch them.
Think about human languages. They have large amounts of redundancy and for a good evolutionary reason I believe. It is easier to distinguish meaning when words around a given word anchor it into a specific context.
Redundant languages have a built-in error-correcting code built into them. They take more bits to transmit but those extra bits correct, and help detect errors.
This was a nice tour of how concise J can be. To see more of this stuff I highly recommend the Concrete Mathematics Companion[0], which uses J as an algebraic tool to reason about the correctness of derivations in that excellent book by Knuth.
> APL was the first language to use “monad” as a term. The popular FP meaning only appeared thirty years later.
in its, to me odd, use of the word "monad". This is incorrect. The "popular FP meaning" arose because it's a very special case of the standard concept from mathematics [1]. The mathematical terminology harkens back to the 1950s, and thus predates APL.
In Haskell, a monad is precisely a monad in the mathematical sense in the special case of the category of Haskell types (which, due to technical complications isn't exactly a perfect model for the actual Haskell types, but it's pretty close). It thus seems dishonest to me to refer to this use of the word "monad" as somehow being newer than APL's use.
> The mathematical terminology harkens back to the 1950s, and thus predates APL.
History isn't particularly logical sometimes, so there could be two competing meanings - which didn't seem too competing at the time (as well as later).
Regarding the history of the word, Wikipedia says:
"A mathematical notation for manipulating arrays was developed by Kenneth E. Iverson, starting in 1957 at Harvard University."
"The mathematician Roger Godement was the first to formulate the concept of a monad (dubbing it a "standard construction") in the late 1950s"
Similarly, those terms seem to develop in parallel:
"In 1960, he began work for IBM where he developed this notation with Adin Falkoff and published it in his book A Programming Language in 1962."
"The form defined above using bind, however, was originally described in 1965 by mathematician Heinrich Kleisli in order to prove that any monad could be characterized as an adjunction between two (covariant) functors."
So monad term arguably wasn't an established one in another meaning when Iverson employed it for APL.
Let's check the etymology of words. Two meanings of "monad" (I'm omitting the one from linear algebra) stem from the idea of oneness - in APL it's "one argument", in category theory it's "one category", as the category is being mapped to itself (same as endofunctor, but monad also has extra requirements, so needs another term). So using the same word for similar properties seems logical - again, in the absence of established different meaning.
In practice, monads in APL and FP are sufficiently different so no confusion usually takes place. Similarly FP uses terms like "lens" and "optics", which are hardly confused with earlier introduced physical terms.
In fact since the Pythagoreans [1] where the monad comes first and after the dyad and after the numbers and so on. It also was the term for unicellular organism before this was coined.
Monadic was an adjective used in logic before the 20th century, using the noun dervied by the greek root monas which gave monad and monadic (and the same with dyad, dyadic).
If I take for right what is on the french [2] and english pages [3] of wikipedia about the terminology history of the monad words its usages seems to date to around the 70 when Saunders Mac Lane named it in reference to the philosophical term.
I feels some times people tend to forget how interconnected is and was mathematics and philosophy and how term from philosophy were used in a mathematical context based on their philosophical definition.
I think "the first language to" is the operative bit.
Yes, the mathematics predates it, but in context I think "only appeared" clearly means "only appeared in a programming context".
(I guess whether you accept this depends on how you're rank ordering different types of pedantry, and exercise I am happy to leave to the discretion of the reader)
I agree with most of this article, except that - for me - grouping by table columns is a more intuitive and flexible way to control subset application than 'rank'.
k9 is in development, but blending k7 and k9 we'd get something like:
Interesting blog post. All of these are implemented as trivial operations in numpy, so while J or APL might be the originator of the syntax, this 'tool of thought' is not as esoteric as the author makes it sound, and learning J if you already know numpy (or TensorFlow or PyTorch or..) isn't that remarkable.
You've reproduced the most trivial example, which many languages make easy. I would be very interested to see another language with a rank operator, for instance.
Challenge for you: rewrite a nontrivial program in one of those frameworks, with the following restrictions:
- No iteration (including implicit iterations—map, filter; reduce is ok)
- No loops whatsoever. Recursion is ok, but should be avoided wherever possible.
- No explicitly named arguments; everything in pointfree style.
(I know, map/filter can be implemented recursively. But compared with the equivalent constructs in apl, they're about 10× as verbose, and harder to reason about and understand. Even reduce is somewhat of a gimme.)
> Rank Is actually implicitly done by numpy using a mechanism called broadcasting.
Numpy's broadcasting is scalar conformability, to which the rank operator (and general conformability) provides a general case. Example in j:
] x =. i. 2 3 4
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
16 17 18 19
20 21 22 23
] y =. i. 3 4
0 1 2 3
4 5 6 7
8 9 10 11
x + y NB. this will error because + expects that, if its arguments' shapes are not the same, one will be a prefix of the other
|length error
| x +y
NB. this is easy enough to fix, however
x +"2 y NB. +"2 is shorthand for +"2 2; meaning, choose rank-2 arrays from both the left and right arguments
0 2 4 6
8 10 12 14
16 18 20 22
12 14 16 18
20 22 24 26
28 30 32 34
Numpy will actually do this without the rank operator, because it uses suffix agreement rather than prefix agreement (which is absolutely bonkers—j used suffix agreement for about 5 minutes in 1990, before realising it was an awful idea). For for numpy, see if you can add:
Intelligently. (I'm sure it's not overly difficult to come up with a solution, but can you do it with a single higher-order function call which generalises to other argument shapes?)
(The j equivalent, (i. 2 3 4) + (i. 2 3) also works without trouble.)
Another example, which may be more illustrative, is the ability to perform reductions along arbitrary axes. For example:
] x =. i. 4 3
0 1 2
3 4 5
6 7 8
9 10 11
+/ x NB. sum reduced along leading axis, the default, producing an array of shape 3
18 22 26
+/"1 x NB. sum each rank-1 array (vector); or, reduce last axis, producing an array of shape 4
3 12 21 30
------------------------------------------------------------------------
Another curiosity, which I have thus far neglected, is the extent to which numpy's being 'a little more verbose' is actually incredibly important in shaping the way you approach and think about problems. The great-uncle comment also addresses this, but Iverson probably says it better than either of us can: read https://www.jsoftware.com/papers/tot.htm
To broadcast operations (such as addition) between arrays in NumPy, trailing dimensions have to be equal (or be of length 1).
In the example given above the 3D array and 2D array have shape (lengths of dimensions):
(2, 3, 4)
(2, 3)
That is - the suffixes do not agree (4 != 3 and 3 != 2) and NumPy raises an error.
However, for the same operation in J the prefixes agree:
(2, 3, 4)
(2, 3)
and the addition gives the expected result.
To add the arrays with these shapes in NumPy, one method is transpose each array (reverse order of the dimensions), add these arrays, and then transpose back:
Thinking about these as a "verbose imperative language" programmer, I'd say suffix agreement seems to make more sense to me at first glance, and I'd like to hear more about why it might be worse.
For why I think it makes sense: I think of multidimensional arrays as being arrays of arrays, and "normal" index lookup operating on the first dimension. If I have a
float[100][3]
in some context I might think of it as 100 vec3s, and I might want to do some vec3 operation on each of them. I might want to dot them all with my some other vector, or add them all to some other vector. I almost never have 100 scalars and want to apply one scalar to all elements of the corresponding vec3.
But I guess maybe this is all widely agreed on, and maybe the contentious part is just index order? Like, maybe you'd say "100 vec3s" is actually
float[3][100]
in which case prefix agreement would make more sense.
Sibling explains the difference. To explain why prefix is better—first let's look at the base case where argument shapes match. Let's assume that addition is defined on scalars; we don't have to explain that 11=5+6. Then let's consider vectors: there are two forms of scalar broadcasting for vectors. The first is addition of two equal-length vectors, for which each item of the left argument will be broadcast to the corresponding item of the right argument: 10 14=3 5+7 9. The second is the addition of a scalar to a vector (or vice versa), where the scalar is matched up with each item of the vector: 10 14=6 10+4.
For the first case, we can say that:
R[i] = x[i] + y[i] (when x and y are vectors)
(Where i is any valid array subscript; and R, x, and y are the names conventionally given to the result, left argument, and right argument of some function, respectively.)
Assume that we extend our scalar rule to arbitrary dimensions (that is, scalar+n-dimensional array will do what we expect)—there is actually a good reason for this, but we'll just assume it for now; it's a pretty intuitive rule.
Then the simple recursive rule I gave above gives you prefix agreement for any two argument shapes; just replace 'vector' with 'nonscalar'. Here are the rules for suffix agreement:
R[i] = x[i] + y[i] (when x and y have the same rank)
R[i] = x + y[i] (when y has bigger rank)
R[i] = x[i] + y (when x has bigger rank)
The prefix agreement rule is simple: add each element of x to its corresponding element in y. The suffix agreement rule is much more complex (3 rules, as opposed to just 1), and with higher-ranked arrays it gets harder to reason about which elements go together.
There's an even deeper synergy, though, which goes between prefix agreement and forks. Forks are a generalisation of a convention from calculus. There's a convention that, given functions f and g, (f+g) is a legal function such that (f+g)(x) <=> f(x)+g(x). (And ditto for other arithmetic operators.) Apl does not distinguish between user-defined functions (like f) and infix builtins (like +); x f y denotes the calling of function f with arguments x and y, same as x + y denotes the calling of function + with arguments x and y. So the f+g rule is generalised: we say that, given any functions f, g, and h, (f g h) x is equivalent to (f x) g (h x).
What does this have to do with prefix agreement? Well, all we have to do is remember that an array, as a relation of indices to elements, is very close to a function. In fact, I think that it's appropriate to say that an array is semantically a function, though its syntactic role is different. So saying x[i] is like saying 'apply function x to argument i'. Which leads very nicely—right into prefix agreement:
(f g h) x <=> (f x) g (h x)
. . .
(x + y)[i] <=> (x[i]) + (y[i])
I have want to chime in. Agree, it is great that ideas from APL have made it into the language... But there is, fortunately, a long and fruitful path still to explore... I love that J is like an alternative path to functional approaches like Haskell and how it plays in terms of providing expressive power. Still, I will not create code in J that is maintained jointly with other users from other domains (which is the case usually when you are using numpy, for example)
I like how APL made it into Common Lisp wholesale. There's a library that compiles APL to Common Lisp, allowing you to mix APL and Lisp code, working on the same data structures with both languages.
(APL being compiled to Lisp is important, not just because it gives you proper interoperability out of the box, but also because on implementations like SBCL this means APL code will get compiled, through Lisp, to native code. And you can do that at runtime too.)
Now I wish I could find something that does the same for J, because I'm sure as hell not going to buy an APL-friendly keyboard.
> APL being compiled to Lisp is important, not just because it gives you proper interoperability out of the box, but also because on implementations like SBCL this means APL code will get compiled, through Lisp, to native code
Except that simd in lisp isn't great and needs to be done mostly by hand[1]; and simd is super important for making apl performant. Also because gc attributes that are good for lisp aren't great for apl. Also because compiling to native code doesn't actually really affect performance of apl, and might actually harm it for larger programs.
To be clear, april is a super-cool project, but not because of any of its performance characteristics.
> Now I wish I could find something that does the same for J, because I'm sure as hell not going to buy an APL-friendly keyboard.
You don't need a special keyboard, just a layout. I don't know how it works on windows (there's an 'IME', I think?) or macos, but x comes with a keyboard layout. I use this:
Which makes it so that holding down the right super key and pressing any key produces an appropriate apl symbol. You may want to use a different key to switch; check 'man xkeyboard-config' and then search for 'Switching to another layout' for other options. Try: modifier key (whatever it is)+r, for ⍴ (rho).
Beautiful that it can get actually compiled... I dreamt of that, but in particular, with J, it'd be very hard as the actual implementation is data driven, i.e., lexical analysis alone would require building a kind of interpreter for chained operations to work efficiently under all circumstances or compile all possible variants of execution.
Comparing numpy to J is like comparing a dirty rag to a designer suit. The examples you've listed are trivial; the power of J and other array languages cannot be appreciated from afar.
The philosophy behind array languages runs much deeper than adding two arrays or transposing matrices.
I have only passing experience with numpy, but I suspect the following "shift a matrix in all 8 directions", used in the famous APL/J game of life solution [0][1], would require multiple statements as well as being more verbose in numpy (I'm guessing you'd need to use "roll" multiple times?):
That requires you to break out of the array paradigm and drop back down to procedural looping. Which is what you'd expect since numpy is array operations bolted onto python as a library and J is array operations baked into the language itself.
It's more about expressiveness and the ability to stay at a "high level" than actually the ability to just do some computation that is really hard elsewhere.
After all, most programming languages can do "the same stuff" - but no one would compare Haskell to PHP.
In most of the examples in the article, the author is using just one or two verbs put together. Imagine if all of the verbs in your language worked seamlessly at the "concept" level, rather than the "write the loop" level.
Even functional constructs like map/filter/reduce take up
a lot of words. One or two characters can do the same thing, and be read faster and more idiomatically by an astute practitioner.
That's my take on it (as an on and off Q/Kdb user, not a J dude -- yet..)
It is not about "cannot be done". Most high-level languages are Turing complete, hence every program in one can be written as some program in another.
The real clincher is expressiveness: shorter code is better, modular code is better, and languages which avoid unnecessary repetitiveness of boiler plate code is better. Different languages provide advantages along one or more of these parameters.
Here's Conway's Game of Life by Arthur Whitney (creator of K):
life:{3=a-x*4=a:2{+(0+':x)+1_x,0}/x}
I'm guessing a numpy implementation would be between one and two orders of magnitude more verbose, even if you're just comparing the number of operations.
It doesn't quite seem fair to compare numpy to one of the most famous K code golf's ever created, but here is my attempt, 140 characters compared to Arthur's 30, so 5 times longer. Not worry about padding the edges would cut it to 100. I don't think the gap is as wide as you think it is.
def life(M):
MP = np.pad(M,[(1,1),(1,1)])
N = sum(np.roll(MP,(i,j),(0,1))
for i in [1,0,-1]
for j in [1,0,-1])
return (3==N-MP*(4==N))[1:-1,1:-1]
Speedtest (presumably memory bound):
100 x 100 matrix, numpy 0.2ms, K 0.2ms
200 x 200 matrix, numpy 0.5ms, K 0.8ms
500 x 500 matrix, numpy 5.2ms, K 8.0ms
1000x1000 matrix, numpy 20.0ms, K 36.0ms
5000x5000 matrix, numpy 0.5s , K 1.2s
Interestingly the K code is designed to return a boolean matrix, but operates much more slowly for me on a boolean matrix compared to an integer matrix, with the result that:
q)\t klife M
Takes 1.2 seconds whilst
q)\t klife klife M
Takes 24 seconds. So whilst the idea seems to be that klife is used with scan/over to run a number of iterations, it's actually a bad idea to run it more than once.
Exactly, one of the things I like about numpy is how it drew from APL and J. It is a more verbose language, and truth be told, J still has more expressive power (non regular processing of data, even state-machine-like processing) which can still influence numpy, or Julia.
I wish somebody would write a walkthrough of writing something like a piece of GUI to show that off.
(I keep trying to figure out J and ... sliding off ... every time I need to do something "normal" ... which is annoying and the fact I'm reduced to wishcasting in HN comments as a solution makes me disappointed in both myself and the universe)
This seems very interesting for numerical computation, but at first glance few of these operations seem to apply directly to general computation (I'm sure that there are isomorphisms that can be used to apply them indirectly).
The example with sorting was the oddest from this point of view. The author praised the idea of the permutation vector as being a relatively direct mathematical specification of sorting (return the permutation of the original array in sorted order) while leaving out the 'sorted' part. Also, this focuses entirely on the result, but there is no word on the algorithm - 'how is the array sorted' is the question that computer science is designed to solve, as opposed to 'what are the fixed points of a sorted array'.
As such, as someone with no experience of J, I'm left wondering if this is another example of the infamous Haskell quicksort - well-picked examples of code which produces a desired result are extraordinarily terse and expressive, but actually implementing well-known algorithms is often just as verbose as C, give or take.
> there is no word on the algorithm - 'how is the array sorted' is the question that computer science is designed to solve, as opposed to 'what are the fixed points of a sorted array'
'[H]ow is the array sorted' is one question that computer science can be used to solve. However, assuming it has been solved (which, for many cases, it has), a much more interesting question is 'what can we do with a sorted array'; or, 'given that sorting has such-and-such time or memory complexity, how hard is this other algorithm?' The OP isn't showing off a particular sorting algorithm, he's showing off a sorting interface, because that's more novel in context.
Sorting according to a predicate could be done with outer product/table+reduction, but usually you don't need anything like that. Let's say you have an array of 3-vectors, and you want to sort the vectors according to the 2nd item in each vector. So your sort predicate is v1[1] < v2[1]. We can just sort the whole array according to the vector of the 2nd items of each vector; so:
] x =. ?@] 9 3$10
2 4 9
3 8 2
8 0 4
3 9 6
3 1 8
2 6 8
1 2 5
9 4 0
5 0 9
(<a:;1) { x NB. index: the first element of every row of x
4 8 0 9 1 6 2 4 0
1 { |: x NB. first element of the transpose of x; equivalent
4 8 0 9 1 6 2 4 0
x /: 1 { |: x NB. x sorted according to the above. Notice that the second column is now sorted
8 0 4
5 0 9
3 1 8
1 2 5
2 4 9
9 4 0
2 6 8
3 8 2
3 9 6
My point was that general programming usually looks more like writing the sorting algorithm than simply calling it. Thus, I would be more interested in seeing real examples of how the properties of J allow you to write algorithms more concisely or otherwise better.
The article showed how J solves some mathematical problems with built in functions or their built-in modifiers. But in general programming, you won't find a function that does exactly what you wanted, maybe with a little adverb. You'll usually have to write that function yourself, to describe how it works in terms of some of those pre-existing functions.
Note that your second example could be expressed as something like
x.OrderBy(y => y[1])
In C#, assuming x is int[N][M]. I understand that J's version can work with many other shapes of X, but that is hard to appreciate without seeing example of how that comes in handy when writing some other (hopefully well known) algorithm.
> My point was that general programming usually looks more like writing the sorting algorithm than simply calling it. Thus, I would be more interested in seeing real examples of how the properties of J allow you to write algorithms more concisely or otherwise better.
Ah - I see. Here[1] is quicksort in j. John scholes's videos are excellent and very illustrative, though they deal with apl rather than j; conway's game of life[2] and sudoku[3].
> The article showed how J solves some mathematical problems with built in functions or their built-in modifiers. But in general programming, you won't find a function that does exactly what you wanted, maybe with a little adverb. You'll usually have to write that function yourself, to describe how it works in terms of some of those pre-existing functions.
Right. This is true, but you generally stay much closer to the builtins in j than you do in other language.
> Note that your second example could be expressed as something like
> x.OrderBy(y => y[1])
> In C#, assuming x is int[N][M]. I understand that J's version can work with many other shapes of X, but that is hard to appreciate without seeing example of how that comes in handy when writing some other (hopefully well known) algorithm.
Right; you mentioned sort predicates, so I was giving an example of a way in which you could sort according to another relation. My point is: being able to change the layout of your data so easily can obviate sort predicates. (Though I do think a sort adverb taking a predicate would be cool.)
Yes, once one goes really deep, it's more like C in the end. One good thing I find about J is that you can still have a clear picture in your mind about how it will be executed, without actually having to write the details yourself. It also ties with SICP lesson of developing a language over a language with right abstractions at each level.
Out of curiosity, internally does a language like J `map` over arrays even though the array is a primitive? Meaning, is `map` essentially a “primitive function”?
In K the error messages are far harder to understand than most languages, and you'd need either enough comments or many hours of experience with the language to understand what a piece of code does, due to its sheer expressiveness. Also, there's comparatively less documentation for when you're trying to do something practical. I can't imagine introducing J or K in an open source project will be helpful if other contributors have to catch up to my understanding of the language and need to internalize and remember which single-character symbols mean "print" or "iter." I imagine that at least a Python programmer could reasonably understand the gist of a similar Ruby program, merely due to the fact that "class" and "map" and "length" are sequences of characters that usually show up in both languages. Not so with K.
Still, I find the vastly different ways of doing things with K interesting to see, even if I will only use it personally.
[0] http://winestockwebdesign.com/Essays/Lisp_Curse.html