Why don't you just learn the vocabulary of patterns used in a compressed representation, like LZ77? Then you can condense any language to make it more readable for your "organic pattern matcher" just by passing the source through gzip.
Here is the thing: I have a pretty good idea of what
flatten(x):=converge(fold(concatenate,x))
does without knowing what exact language it is written in, or having to crack open a reference manual for it. This is not so of your sample of puerlile line noise, which is completely arbitary. If we feed it to, say Golfscript, it will do something completely different.
Also, if I delete a ) character from the above expression, I can guess that it is probably not well-formed:
flatten(x:=converge(fold(concatenate,x))
^ not matched
flatten(x):=converge(fold(concatenate,x)
^ not matched
Then names like flatten and converge are atoms. We could condense this code by giving them single-character names (using some Unicode glyphs perhaps or whatever). Let's just use roman capitals:
F(x):=V(R(C,x))
It's not actually that long in the number of tokens or atoms.
Anyway, most computer scientists and software engineers stop measuring code by character count by the time they get out of puberty.
Nobody wants to write proofs as English prose like they did when first proving basic algebraic properties. Yet those proofs are much easier to guess at the meaning of, even if you don't fully get it, than using our standard mathematical notation. Of course, it just so happens that our mathematical notation allows middle schoolers and high schoolers to think of these proofs as obvious and trivial, whereas they would have no idea what to make of the English proofs that are more "natural" to the English speaker. Of course, that notation that they use with such facility (even if they aren't particularly competent with it and just learned it last week) has no relation and no intuition at all to the English language, and makes no attempt to do so.
Yet, I challenge you to back up your statement about character counting by showing me a Computer Scientist and a Software Engineer who no longer writes or uses traditional mathematical notation for anything, but has systematically moved over to the more intuitive and natural programming language of their choice, such as Scheme, C, or Haskell.
APL is a suitable replacement for Mathematical notation and was designed as such. Other programming languages are not. That's a very important element.
It's not about character counting. See Iverson's Notation as a Tool of Thought Turing Award lecture.
You might have a stronger case if, since the advent of computing programming languages, which are demonstrably capable of representing a whole host of mathematical ideas and concepts directly, people have begun flocking to programming languages to replace their mathematical notation in the communication of ideas in non-computer science contexts.
However, people rarely consider a programming language equivalent to be better/superior to traditional mathematical notation when they are talking amongst themselves in front of a whiteboard.
>Nobody wants to write proofs as English prose like they did when first proving basic algebraic properties.
You'd be surprised. Mathematical proofs contain lots of "english prose".
>Yet, I challenge you to back up your statement about character counting by showing me a Computer Scientist and a Software Engineer who no longer writes or uses traditional mathematical notation for anything, but has systematically moved over to the more intuitive and natural programming language of their choice, such as Scheme, C, or Haskell.
That's irrelevant. Best tool for the job et al.
>APL is a suitable replacement for Mathematical notation and was designed as such.
That doesn't make it suitable for programming in general.
And mathematical notation is much richer than APL, it's not constrained by mere sequential characters on a line.
Some mathematical proofs contain lots of "english prose", but not in fields that have nailed down the semantics; and no field that did even goes back to prose.
mathematical notation is not automatically suitable for programming in general, but is not automatically unsuitable.
APL was devised as a notation for algorithms, not for general math; Iverson noticed that at the time (1950s), everyone was inventing their own algorithm specification when give mathematical proof, and tried to unify that. And since it was well enough specified, someone later wrote an interpreter, and that's how APL came to be more than notation. I urge everyone to read Iverson's "notation as a tool of thought".
> Best tool for the job et al.
Best tool depends on context. No programming language is the best tool if the person reading/writing/mainting it does not know the basics of how to program.
APL is suitable for programming in general for people who take the time to learn it, and is often the best tool for the job, as long as the context is "to be read/maintained by other people who know APL". IMHO, this is an unfortunately rare context.
No software development in any language should be undertaken by people who don't understand the language and surrounding tooling and libraries. No program in any language should be written with the expectation that it can be maintained by people who don't know the language. This is not some APL predicament.
arcfide's answer is excellent; there's another aspect I like to present to this kind of rant (which invariably always comes up in the discussion of APL like languages):
When people write code in Pascal/Java/C/C++, it is considered bad style if it is not indented properly (where "properly" is the subject of religious wars, but there is mostly universal agreement that deeper scopes should have more indentation). That's not an issue with the language per-se, as it is pervasive among all those languages that did not eliminate it a-priori (like Python and Nim).
The compiler doesn't care; but programmers do, because we read code a lot more times than we write it, and the visual structure of the code informs us of (some of) its semantics, e.g. "this indented block is dominated by a condition specified in the lines above and below it; now let's see what the condition is, and whether it's an "if" or a loop of some kind".
And the real reasons programmers do, is because it makes the code easier to read "at a glance" -- using our efficient visual system to quickly recognize patterns immediately.
Using APL/J/K goes to the extreme with this principle - as it allows whole (useful!) functions to be compressed down to a few characters, it cuts down the need for at least one, often two or more, bottom levels of abstraction - one doesn't need to abstract computation of an average to a routine called "average" when the entire implementation (e.g. in J) " +/ % # " is shorter than the word "average".
One might argue that it is still better to name this; However, +/%# has the distinct advantage that it also conveys the behavior on an array of length 0 or a 2d-matrix, (what does your "average" function do? you have to read the source code or the often-incomplete documentation), the behavior on a matrix (it give a column by column average), etc.
There is a learning curve, for sure - but just like mathematical notation, once you have learned it and used it, it is concise, readable, pattern-matchable-with-eyes, and no one goes back to "let us multiply x by itself y items" when they are familiar with the notation "x^y".
Also, C++ is significantly more popular than the programming language "ADD ONE TO COBOL GIVING COBOL" for the same reason.
You have somewhat of a small point there that if a calculation is made up of a four-symbol idiom, it bears repeating in the code rather than hiding behind a definition. However, that doesn't extend to arbitrary length. There is a threshold at which it's better to have a definition. I wouldn't copy and paste a sequence of 15 symbols, say. Somewhere between 4 and 15, there is a "knee".
Fact is, large software is made up of definitions. Vast numbers of them.
You can copy and paste everything only in tiny programs for your own use.
Programs are written across multiple lines not just because of indentation, but because we reference code changes to lines. Version control tools currently, by and large, work with line granularity. In code reviews, we refer to lines: "everyone, please look at line 72".
> using our efficient visual system to quickly recognize patterns immediately.
Not everyone has this; maybe only a few otherwise unemployable code-golfing idiot savants.
Maybe I could recognize +/%# if it is on its own. In real code, it will be in the middle of other such cuneiform, because it has to obtain inputs from somewhere, and there is other code expecting its outputs. I will not easily spot +/%# in a ream of similar $%#={|: type stuff.
> what does your "average" function do? you have to read the source code or the often-incomplete documentation
That just shows you're coming from an environment where you expect user-defined code to be poorly documented.
Of course, people don't overly document their page long code-golfed line noise that is for their personal use.
Here is the thing: I have a pretty good idea of what
does without knowing what exact language it is written in, or having to crack open a reference manual for it. This is not so of your sample of puerlile line noise, which is completely arbitary. If we feed it to, say Golfscript, it will do something completely different.Also, if I delete a ) character from the above expression, I can guess that it is probably not well-formed:
Then names like flatten and converge are atoms. We could condense this code by giving them single-character names (using some Unicode glyphs perhaps or whatever). Let's just use roman capitals: It's not actually that long in the number of tokens or atoms.Anyway, most computer scientists and software engineers stop measuring code by character count by the time they get out of puberty.