Somebody, somewhere, wrote the Text::LevenshteinXS module. Somebody, somewhere, had to write sed, awk, head, sort, tr...
It's all fine and dandy to say "look at these awesome tools that make tasks like these trivially easy. See how powerful Unix is". But this fails to consider that somebody, somewhere, has to be a tool writer, not just a tool user. Knuth's code was a tool writer's code, exemplifying a technique (Literate Programming) that is aimed at long form code writers in general.
As with others before, the author fails to grasp that this is an apples to oranges comparison.
This is one of those stories I glance at briefly out of a kind of mildly masochistic fascination ("are people really going to argue about this" "you know damn well they are").
I don't think there is such a sharp distinction between writing and using tools. You could take the 9-command pipeline, paste it into a shellscript, and now you have a new tool.
And isn't this how most programming is done? Typically your program relies on a set of libraries, but the methodology of writing the library looks very similar to the methodology of writing the client code; at the end of the day, it's all programming.
If literate programming is great, then I would expect it would be great everywhere. And if decomposing your program into a few steps and piping together already-written software for those steps is great, then I would expect you should try to use that style of programming as widely as possible.
When the task was to make a tool, the criticism should not be that there are other tools to do this.
So, yeah. If you want a chair, you just go buy a chair. But if you are curious if a method of furniture making is good, you don't take as evidence someone that orders a chair from a catalog.
> you don't take as evidence someone that orders a chair from a catalog.
That's misrepresenting what's going on.
A better parallel is that the person curious about furniture-making still isn't doing everything from scratch. They're gonna go buy wood and tools - they're not creating an axe, hammer, screwdriver, screws, nails, chopping down a tree, and shaping the wood themselves. The furniture-maker is still following the unix philosophy of using and building on prior work.
Depends how you read it. He offered to demonstrate a tool, literate programming. The task he was asked to demonstrate against was counting words. He was not asked to count words, that is different.
It was a purposely pedagogical exercise to demonstrate literate programming. Which is itself a pedagogical method to write a program.
To that end, reading his program you can learn about how he wrote his program. Reading the shell script can really just learn how the program was written. If you don't know what those commands do, you will not learn it from the script. I don't know Pascal, but most of the other still makes sense to me. Such that I could probably port it. The shell script?
Now, again, if the task is just to count the words, you are probably fine with whatever crosses the finish line. If the task is to demonstrate literate programming? How does that help?
> As with others before, the author fails to grasp that this is an apples to oranges comparison.
I think comparing apples to oranges would be comparing Knuth as a programmer to Spinellis (the author of the blog post) as a programmer; they obviously have different approaches to programming, which one can't compare without bringing in some externally imposed value judgements.
The post itself isn't a comparison at all: it says "this task, which was claimed to be very difficult using only a certain set of tools, is actually easier than one would expect even when restricted to those tools." (OK, I guess 'easier' is a comparison, technically, but only to expectations.) This says nothing about the quality of Knuth, or of Spinellis, or of the ideologies of tool use and tool creation; it says only that the existing Unix toolkit is very rich, much more so than some might expect.
Those authors (of the Perl module and other software) may have worked from existing source material from other authors, too. In fact I'd guess it's pretty likely. Research into this could even promote a much more inclusive and informed approach to technology use and development.
(Until then though, perhaps we'll keep having this use-your-own-brain vs. don't-reinvent-the-wheel discussion)
Just like apples and oranges can be compared as foods (and fruits), bespoke vs. reused solutions can be compared so software. The reused parts are of higher quality than their corresponding bespoke implementations and can be composed to accomplish the same task as well as many others. It's a powerful lesson.
Most of the components are multi-platform, partly or fully POSIX-standardized, battle-tested, blazing, fast, etc. It's some of the most widely-used and arguably greatest software ever written.
And some of it is absolutely terrible with several tools overlapping in use cases along with unclear perfornance hints. Even worse is the combination performing worse which may not matter on small inputs but has a huge impact when you crank up the size. Then there's the issue of understanding why performance suffers and in what cases it's better to roll your own custom solution that better fits the problem.
Reuse saves work when your problem maps perfectly to existing tools while what may seem like a minor difference will propagate down your program to end in countless issues from a codebase you don't know.
It takes real stones to suggest Unix utility code is better quality than Knuth's handicraft. And, not having looked into much of it. Unix code usually works well enough on unchallenging input.
By some indices (simplicity, documented-ness, accessibility {as in: I can read and understand it without learning more than one language's behavior vs. bash/C/several others}), Knuth wins hands-down.
By others (generality, speed of implementation) he does not.
It is normal to point out at this point that if correctness is not important to you, an overwhelmingly more quickly produced implementation is possible.
But my comment was on remarks claiming superior reliability for the Unix utilities, which you have not addressed.
I came here to basically say the same thing ... if Knuth weren't demonstrating Literate Programming (LP), he would have used a library version of the Trie function that took up most of those eight pages. At that point, who cares whether those six pages were done in (LP) or not ... it might be easier to maintain in the long-term but either way it's code you don't have to write. If he was finding the Levenshtein distance as with this post, maybe his LP system could use the Perl library too?
1. I can't speak for Hillel Wayne who used the word "framed", but I didn't understand his newsletter post as Bentley having "framed" Knuth -- I understood his post as pointing out that in the popular imagination/folklore, the story had mutated over the course of years from the original setting (a program that Knuth was asked to write in WEB specifically as that was the point, and a review of that program by McIlroy evangelizing the then little-known Unix philosophy) to a "framing" where two people were competing to solve the same problem with the same available resources, and one of them did it in a "worse" way. (Also left this comment on the blog post above.)
2. Here's a comment on the previous thread from someone who says they read the column when it was posted, and their reaction they say was one of cringing -- so at least at that time it probably wasn't perceived that way: https://news.ycombinator.com/item?id=22418721
3. Much of the space taken by the literate program is for explaining a very interesting data structure that we could call a hash-packed trie (AFAICT, devised on that occasion for that problem -- a small twist on the packed tries used in TeX for hyphenation, and described in the thesis of Knuth's student Frank Liang). One cannot obtain this data structure by combining other programs, only by combining other ideas. (I mentioned this in the previous thread as well: https://news.ycombinator.com/item?id=22413391)
4. So as far as evaluating literate programming goes, the real question (and the answer is not obvious to me!) is: if you're going to write a program that uses a custom data structure (like this), how should you organize that program? Should you write it as Knuth does, or as a conventional program (like I tried to do with my translation: https://codegolf.stackexchange.com/a/197870)? And as for estimating the value of a new data structure in the first place: as of now (at that question), solutions based on a trie are about 200 times faster than the shell pipeline, on a large testcase. (The hash-packed trie, which Knuth calls "slow" in his program, is not so bad either, and it does economize on memory a bit.)
I have my own answer for #4 (which, to me, is the only interesting question about this affair). I've actually done a fair amount of literate programming on my own, although I only have a couple of examples that one can look at these days. Here is a small library for fluent matcher system for Jasmine and React: https://github.com/ygt-mikekchar/react-maybe-matchers/blob/m...
You will see that I've included yet another monad tutorial :-) I don't link to this as a way of saying that I think this is a good example of LP. It's not really. I was experimenting quite a lot. However, I can tell you one thing about it: it is practically impossible to refactor.
As a result, I decided that LP is not particularly good for working on living programs. Or, at least, it is not conducive to my style of programming, which encourages refactoring. Nothing I write is "frozen". It is all in flux and so the value of documentation is transient. Additionally, it is rare that a programmer wishes to read code from the top to the bottom. If they ever do, it's usually the first time they have read the code. After that, they will want fast access to the parts that they want to modify. Sorting out the code from the text becomes difficult. If you make a change, you also have to review all of the text to make sure that you haven't clobber something that is referenced elsewhere. It will work well for something short, but it's not great for large projects.
I still do LP style things. Here is an unfinished blog post on ideas about OO: https://github.com/ygt-mikekchar/oojs/blob/master/oojs.org However, to contrast with this, I would invite you to look at https://gitlab.com/mikekchar/testy where I put some of those ideas into action (especially see the design.md and coding_standard.md documents to show what constraints I chose in this experiment). Crucially, after this code had run its course, I'd changed a lot of my ideas and never went back to my blog post. For me, the actual code is far more instructive than the blog post ever was. Of course, I'm the author, so I understand what I was trying to say and I only need a quick peek at the code to remind me what I was thinking.
For me, that's the dilemma of LP: once you know what you want to know, the text is in the way. New people will benefit from the Gentle Introduction (sorry, couldn't resist the TeX reference...), but 99% of the time nobody will benefit from it. Is the other 1% of the time worth it. It may be, actually, but boy is it hard to convince yourself of that!
Thank you, the voice of experience counts for a lot, and I'm glad to hear from a rare person who has actually tried LP seriously (I'm not one of them!). I'd like to dig deeper for your thoughts on a couple of your interesting points:
• Living programs: You mention the point that you find LP hard to refactor, because things written tend to feel "frozen". But writers do often mention ripping out several chapters of their books or carrying out extensive rewrites in response to editors' feedback etc. (Though some don't: look for the second mention of "Len Deighton" in this wonderful profile of the editor Robert Gottlieb: https://web.archive.org/web/20161227170954/http://www.thepar...) Conversely, for those of us without much writing experience, I wonder whether literate programming may train us to become better writers, in the sense that programming (which inevitably tends to require rewriting) may make us more comfortable with doing major rewrites of our work. (Or at the very least, train us to chunk our code in a way with an eye to which parts might likely to be changed together later, which otherwise in code may be far apart.)
• Linear reading versus fast random access to code: I think it's very much true that after (or even during!) the first reading, one wants fast access to relevant sections of code, and not to read it from top to bottom. But books are also designed for random access. (The first piece of advice here: https://www.cs.cmu.edu/~mblum/research/pdf/grad.html) Many of the contrivances of Knuth-style LP (the cross-references, the indexes, the table of contents, the list of section names at the end, for that matter even section numbers and page numbers) seem designed to facilitate this. (See the TeX program at http://texdoc.net/pkg/tex especially the ToC on the first page and the two indexes at the end; the printed book also has a mini-index on each two-page spread, which is missing here.) In fact, I'd imagine that even if all that you used LP for was to organize the code in a way that better facilitates random access (e.g. just add section names to your code blocks, or move error-handling code to a separate section to be tangled-in later) it alone may prove worth it.
• Documentation versus code: In one of your examples, you seem to be writing exposition / documenting the (user-level) purpose of the code at the same time as programming. Do you find this to be the case often? My experience with LP is mainly with attempting to read the TeX program, which on the first page says "[…] rarely attempt to explain the TeX language itself, since the reader is supposed to be familiar with The TeXbook" (the TeX manual). And for the most part, whatever text is in the program is about the code itself, things that still matter once you know the program already. (This is in fact my struggle with it, it's not written like a novel; all the text is oriented towards details of the program code itself.) As that's a large example, pick a small one like this: https://github.com/shreevatsa/knuth-literate-programs/blob/m... -- there is an intro page about the problem and cache size etc., but most of the rest of the text seems comparable to what one might write as a comments even if not doing “literate programming” as such. So the main difference LP is contributing seems to be with code organization (what one might otherwise do with functions). In fact, probably most of us modern programmers wouldn't consider it the best way to organize this program, but it's interesting to consider what the author's intent may be with organizing code that way.
> Here is a small library for fluent matcher system
It doesn't seem like Literate CoffeeScript lets you reorder the code blocks which, as I understand things, is the fundamental part of Literate Programming - code follows documentation, not the other way around.
(Although, to be fair, 99% of things I've seen labelled as LP aren't either. There's only WEB, CWEB, and noweb I can think of that'd count just now.)
Yes you are absolutely correct and it's definitely a big problem from an LP perspective. Babel gives me a bit more leeway, though. But from the perspective of "is this worth it", not having it reordered actually makes it easier to work with, in my opinion. I think if you had tools that allowed you to work with the generated code and be able to jump back and forth to the sources it might be OK.
The rebuttal itself is a lot more comical than any joke: literally countering Knuth’s “code should read and flow like prose for full understanding” with “f you, look at what I can do with my 20 years of hacking and thousands of lines of code I’ve never seen, written by someone else”.
I really like the idea of the Unix model as well, but you're not going to be able to use it effectively to write an actual application. If you're writing a word processor and you need to find the levenshtein distance between the most frequent word pairs (maybe some measure of how alliterative/consonant/assonant your document is?) then you're probably not going to be building the word processor using the Unix model, and even if you are (the closest you can get is probably using Tcl/Tk?) then it's still best to write out what you're doing as clear as possible. Note that it took me about 5 minutes to figure out what the shell pipeline presented in the article actually does, and multiple times my reasoning about it lead me to think "wait, does this actually do what it's supposed to do?"
A word processor is anti-Unix on its face. If you want the Unix equivalent, look no further than vi and TeX. With vi you can pipe your document through a spellchecker such as gnu aspell [1].
Thank you for adding to the considerable weight of literature completely missing the point of Dr. Knuth's literate software, which is:
tr, sort, uniq, and sed, should all be literate programs.
They would be easier to read, reason about, modify, and extend. At this point, tooling for literate programming lags considerably compared to illiterate programming, and that's entirely because of the determination to miss the point exhibited here.
The PDF still doesn't help much. The expositionary style of breaking out inner code blocks from their call site harms the ability to understand what's happening. It's nearly impossible to follow in the raw source. Hyperlinks don't improve matters much and the PDF rendering doesn't have rational layout for details like numeric tables.
The first implements a programming language and typesetting system, while the second just swaps characters. I'm not sure it's a fair comparison. (Additionally, the TeX source is also meant to be formatted, not read raw.)
The TeX source is not in the form in which it is intended to be read. It's like you showing the current HTML and JavaScript source code of some article and complain that the message is hard to read.
What you showed as the TeX source is at the same time a source representation for this book:
Computers & Typesetting, Volume B: TeX: The Program (Reading, Massachusetts: Addison-Wesley, 1986), ISBN 0-201-13437-3
and at the same time the "plain" Pascal program can be extracted from that same source representation(1)
That was the idea of Literate Programming that Knuth also tried to demonstrate in his article as he was "framed."
Which other program that is hard to develop (that one took 10 years of the best programmer in the world, supported by his students and assistants) has a nicely printed book form that fits 600 pages and has all the descriptions?
Even more impressive, Knuth intentionally developed his program with the specific idea that its outputs are the same no matter how much the computers change in the future. And he managed to achieve this -- the more ports of his original program are available everywhere and using his sources from eighties produce exactly same pages as then.
------------
1) "WEB programs are converted to Pascal sources by tangle and to a TeX input file by weave. Of course, tangle and weave are WEB programs as well. So one needs tangle to build tangle---and weave and TeX to read a beautifully typeset WEB program" -- that is, if you don't buy a book which is already typeset and printed.
Also, from my point of view, I am MUCH MORE conditioned to read second variant of code.
Third point of mine is that TeX source must be read in PDF form, not in the TeX form. I have the courage to extend that that TeX source code must be manipulated in PDF (read: readable) form, not as the source code per se.
Literate programs are essays. They might be easy to read and reason about, but not modify or extend. A computer program is not best understood and managed as a linear artifact. Much of it's power is in graph nature.
Documentation comments are great, but that's not the same as literate programming.
I don't think a literate program is more or less linear than the source code that is extracted/tangled from it. Both artifacts have a sequence: for a C program, the tangled version would put the #includes before declarations before definitions, for example.
In that sense, the LP program is an alternate linearization of the program, in that the authors can choose the order in which to introduce the program. But few LP programs are naively linear -- they typically impose a tree structure on the code, made up of labelled sections and subsections. Readers don't have to start at line 1 of the program/essay, they can navigate from the table of contents to the section of interest.
A compelling argument for LP is that it's an additive technology. If you don't want to read the essay, that's fine -- just tangle the code, and read the source-code artifact instead. With the right tooling (which admittedly may not exist!) an IDE could let you edit the tangled version directly, and put your edits back into the "essay" at the right places, so round-trip editing would be feasible.
I think I understand part of the problem. Many "literate programs" aren't literate in Knuth's sense. they are merely inversions of the conventional model. Where text is the default and code is the special case that has to be demarcated. Things like literate markdown I've seen which typically read like a regular program with extra text:
# A Literate Program
This is a literate program, the language is C. We'll
begin with the includes because that's what C has at
the start of every C file, and not because it makes
any sense for the presentation:
```
#include <stdio.h>
...
```
Here are the declarations, you can ignore these for
now.
```
int main();
double square(double x);
```
Now that that's out of the way, ...
If that's all most people see then they haven't actually seen the benefit of LP. Where you can push that boilerplate stuff to an appendix so no one has to see it unless they're changing the libraries used by the system or some other thing that's important, but less essential to the understanding that LP tries to promote.
That's an excellent point. I said "most literate programs aren't linear" in my comment... But I wasn't considering the low-effort linear style that many people actually use, so I'm probably wrong on that. :)
"Low-effort linear literate" is a useful style, but I think it falls quite short of what Knuth had in mind.
I've found it to be a useful bootstrap toward a properly literate environment, which will require considerable tooling support to provide a reasonably modern experience.
Happily, we have the Language Server Protocol now, so many of the key components are already in place...
It's not fair to compare 40(!) years of advances in tooling surrounding the pile-of-files approach to software, to the somewhat withered on the vine approach embodied in literate programming.
It's a road not taken, and I think that's a pity, so I'm doing something about it <shrug>
> they typically impose a tree structure on the code, made up of labelled sections and subsections.
But most non-trivial programs don't have a tree structure, but are a full directed graph. Methods calling other methods, classes inheriting from other classes or implementing interfaces, etc. Programming and debugging could involve traversing and modifying this graph in almost any order, and does not lend itself to one preferred linearization.
An LP style that somehow reflected the various graphs of a program (control flow, inheritance, etc.) might be very interesting! I'm not sure what it would look like, but it sounds like a starting point for experimentation.
A big program could be broken into modules -- as we already do -- and each module (or its sub-modules) could be documented in a literate style, independently from the other modules. Maybe the graphs of the program (or at least, the graphs of its highly-connected modules) could be presented as alternate trails, indices, tables of contents, etc., each with its own accompanying narrative overview. It sounds gnarly, but not impossible! On the other hand, losing linearity altogether seems to be an anti-goal for an inherently narrative programming style like LP. If you're telling a story (about code, or anything else), eventually you have to put one sentence before the next. At some level of granularity, you have to commit to straight lines.
When he wrote his book, it's clear that Knuth was very much aware that LP was an unusual, and possibly crazy, idea. The book is written in a humble style, like an invitation to explore a design space with him, and not as a prescriptive text. For example, I think the fact that he included McIlroy's full critique in the book speaks to his intentions. I guess my point is that Knuth would probably love that you're challenging his ideas and exploring the design space with him, rather than dismissing LP outright.
> But most non-trivial programs don't have a tree structure,
Pretty much all programs have at least one tree structure that covers the entire program (the AST), though that may or may not be the most interesting view of the program structure.
One such program that I know of and use in conjunction with pandoc is enTangleD[1] which supports editing on both at the cost of comments embedded in the source to keep track of blocks. All that's really required is folding of the comments to get half way to the desired IDE.
> Literate programs are essays. They might be easy to read and reason about, but not modify or extend.
What's your rationale behind that last part? Have you ever used any notebook-styled interface like with Mathematica or Jupyter? It's perfectly feasible to tear out a chunk of material and replace it with something else, if you've organized it well. This is no different than the same constraint for conventionally written software. You can't easily refactor shitty code. You can't easily refactor shitty literate code. If you organize your code well and write quality code, whether in the literate or conventional style, you can refactor it with relative ease.
> A computer program is not best understood and managed as a linear artifact. Much of it's power is in graph nature.
And how does that conflict with literate programming? Literate programming permits the reorganization of code into any arbitrary structure. Which means that the graphical nature of the code can be made even more obvious than in most conventional languages. You're no longer bound by single-module files (see Java) or other arbitrary textual constraints. You can place the code in the place that makes the most sense for explication. Or put it adjacent to where it's used, even if it ends up tangled in a different file.
Effective Debugging book is a must read for any software developer. The Elements of Computing Style is useful for any knowledge worker. Code Reading is probably the only important book in the subject.
The only problem with his books is that they are rather expensive, specially for developers who doesn't earn in dollars :-(
That is indeed a pity. I try to compensate by making as much material as possible openly available, such as through the MOOC you mentioned (I've been working for five years on it), through my blog, and through open source software and content.
The author did it in one line of Perl, using an existing library. How is that different from using awk? Yes awk is widely deployed but so is CPAN. In any case deployment isn’t part of the argument for using the UNIX philosophy.
> ... [a] more practical, much faster to implement, debug and modify solution of the problem takes only six lines of shell script by reusing standard Unix utilities.
Unlike tr, sort, uniq, awk and tr, perl is not a standard Unix utility. Not only that, but Text::LevenshteinXS is a plugin that must be downloaded.
It's still far more convenient than Knuth's work, and it follows Spinellis' reasoning about the Unix mindset, but Spinellis' Levenshtein example doesn't actually support Mcllroy's original argument.
As far as I can see, it's roughly "Data structures are hard, so let's pretend everything is ACSII text. Now we can use a really difficult systems programming language (C) to build functions with weird calling conventions ("tools") and glue them together with an awful scripting language (sh)."
'awk' fits into this framework awkwardly. It implements a restricted pattern (go line-by-line, match actions to lines), it doesn't want to be a full programming language, even though it really is.
But 'perl' is a programming language, and it wants to be one. Once you have 'perl', what is the point of using a reasonable scripting language (perl) to build functions with weird calling conventions and gluing them together with an awful scripting language? You're better off writing functions(!) with normal calling conventions (a library) and gluing them together using the good scripting language.
That logic taken taken to its conclusion replaces the shell with a clean language, encourages libraries instead of "tools", and embeds the 'awk' pattern into said language instead of relegating it to an incomplete secondary scripting language. In one word: 'scsh'.
I believe it is the idea of writing small tools focused on doing one thing well with reusability in mind as opposed to writing larger complicated tools that do multiple things.
A cargo cult philosophy never adopted by commercial UNIX clones and adored by UNIX FOSS, where the man page of each GNU tool describing the available set of command line arguments looks like a kitchen sink.
> In my everyday work, I use Unix commands many times daily to perform diverse and very different tasks. I very rarely encounter tasks that cannot be solved by joining together a couple of commands.
Others just use a REPL instead, where tr, sort, uniq, and sed get to be function calls with a threading macro.
The UNIX shell is a primitive REPL, without the capabilities of the REPLs developed at Xerox PARC, TI and Genera, regarding structured data, debugging tools, function composition, inline graphics, ability to directly interact with OS APIs.
A car without infrastructure is just a fancy box.
A chariot without infrastructure is a rideable horse.
A rideable horse in the age of broken, disparate, infrastructure.
But, at the end of the day, it all depends on what you're trying to accomplish. I use repls, shells, notebooks, etc, on a regular basis. Unix tools solve some problems. Repls solve other problems. Notebooks another. What's important, to me, is to be able to be able to make the most out of them all, despite their flaws, because they're simply the tools that we have in our toolchain. It would be a shame to not learn our own tools, when they can offer us so, so much.
No need for language snobbery - the sanity of the "language" isn't what's being discussed here. We're talking about the capabilities of of unix tools within domains where they'd be used. If you want to use a repl within your domain, that's your choice, but understand that in doing so, you're working with a relatively limited domain compared those within the reach of unix binaries.
The threading macro just represents function composition, while a Unix pipe represents buffered streaming I/O (or composition of dataflow operators, if you like). Two related but quite different things.
It can, but only at the cost of complecting source and sink.
By default, function composition is eager: a function is expected to do its work, and hand the whole return value off to the next function.
We can make this lazy, by setting up an iterator and handing this off. At some expense: our next function must expect an iterator, and therefore can't handle a full data structure. At minimum it must coerce those into an iterator when encountered.
Also, it gets awkward to reason about iterators wrapped in iterators wrapped in iterators, even with a debugger, you get action at a distance, where the fourth function in your thread is failing because the first iterator of three has a flaw in it.
Shell pipes handle all of this for the user, with sensible defaults which can be overridden and modified for special cases. It's a powerful abstraction and I wish more languages offered something like it.
Using TXR Lisp, obtain the list of words in /etc/fstab on a Ubuntu 18 system, sort them to get identical words into groups which are represented as sublists, then sort by descending length of sublist (i.e. frequency), take the top ten, and turn that into word-frequency pairs:
This is the TXR Lisp interactive listener of TXR 232.
Quit with :quit or Ctrl-D on empty line. Ctrl-X ? for cheatsheet.
1> [(opip (open-file "/etc/fstab")
(record-adapter #/[^A-Za-z]+/)
get-lines
sort-group
(sort @1 greater len)
(take 10)
(mapcar [juxt car len]))]
(("defaults" 5) ("a" 3) ("dev" 3) ("ext" 3) ("home" 3) ("opt" 3)
("UUID" 2) ("c" 2) ("d" 2) ("e" 2))
Pretty print the list obtained from prompt 1:
2> (mapdo (do put-line `@(car @1) -> @(cadr @1)`) *1)
defaults -> 5
a -> 3
dev -> 3
ext -> 3
home -> 3
opt -> 3
UUID -> 2
c -> 2
d -> 2
e -> 2
nil
I think its entirely fair to say that Knuth was in a frame to demonstrate one thing, and implement something second. That he was ideating about the subject didn't - necessarily - prevent a successful implementation.
Certainly, as another set of eyes, the lower character count matters most, though.
I want to read the follow up article where the challenge is to create a typesetted document. Bentleys criticism includes a single like Shell script invoking Latex.
“Through this demonstration I haven't proven that Bentley didn't frame Knuth; it seems that at some point McIlroy admitted that the criticism was unfair.“
To me the question doesn't depend so much on whether Knuth was "framed".
The meaningful criticism leveled at Knuth's code was that it was monolithic. It's true that it was long because he wrote it from scratch, but that's not enough to force you to be tightly coupled.
Did Knuth try to make his code reusable? Was it reusable? I think those are the key questions.
That's not really a meaningful criticism. As others have pointed out, the point of Knuth's exercise was not to optimally solve the technical problem, but to demonstrate the effectiveness of Literate Programming (or the lack thereof). The technical problem was just a strawman, so that Knuth had a non-trivial program to demonstrate. With this in mind, McIlroy's pipe example isn't a critique of LP at all -- if anything, it was just a distracting advertisement for the Unix style of composing programs in the shell.
What McIlroy could have examined -- and chose not to at the time -- is whether awk, sed, tr, and friends could themselves be written in a literate style, and whether such a rewrite would have achieved the goals that Knuth was setting out for LP.
Knuth could have chosen to break his monolith into multiple, loosely-coupled programs, and then written then all in an LP style. But would that have really made the demonstration any more effective?
> Knuth could have chosen to break his monolith into multiple, loosely-coupled programs, and then written then all in an LP style. But would that have really made the demonstration any more effective?
I would say yes. Clearly loose-coupling isn't necessary for a program that small. And no, it isn't always optimal.
But I have clear memories of being asked to re-do an intro CS assignment three times because it wasn't in a properly object-oriented style. Modularity is not a necessity all of the time, but it is important sometimes. Demonstrating the potential to write reusable code seems just as important as demonstrating anything else. (If the conclusion is "LP helps clarity, but you can't write libraries", is that even positive?)
As far as I can tell, "this code has only one useful point of entry" was a key part of the anti-LP argument leveled by McIlroy. After all, isn't the goal to demonstrate that LP works for people who aren't Donald Knuth?
I think McIlroy missed the mark here. Single points of entry, and tightly-coupled code, might be reasonable criticisms of Knuth's personal style, but I don't see them as inherent limitations of the LP approach. You could write multiple useful, interesting narratives about a library's core elements -- algorithms, data structures, etc., -- and then write a simple appendix documenting the entry points / API. The style itself doesn't have to get in the way of good program structure.
My own critique of LP is really more about the act of writing itself. Many people, programmers included, just aren't skilled at it! Knuth's literate programs are interesting because he's got something interesting to day, and his writing style is engaging. But I wouldn't enjoy having to read (or maintain!) a literate program that was written by a poor writer in a dull, meandering style.
Also, Knuth seems to think that the literate style ought to make us into better programmers, simply because we're writing prose along with the code -- that the combination somehow unlocks a better understanding of the problem, how to solve it, and how to explain it to others. That sounds inspiring, but I'm not sure it's really true in the general case. Perhaps more research is needed to find out. :)
Did Knuth try to make his code reusable? Was it reusable? I think those are the key questions.
Not really.
Knuth focused on maintainability. McIlroy focused on reuse of existing code. These are very different, though both laudable, goals.
As an example consider the widely used program for mathematical typesetting, TeX. Knuth started work on it in 1978. Since 1988 it has only received bug fixes. Despite it being widely used, there have been no bug fixes needed since 2014. I cannot think of another program so widely used with such a good maintenance record.
However TeX itself reuses almost nothing.
No code that Doug McIlroy wrote has a maintenance record to match. But McIlroy was the original author of widely used programs including diff, echo, tr, and join. The combined works of Knuth are unlikely to have been reused in more ways by more people than diff alone.
> Did Knuth try to make his code reusable? Was it reusable? I think those are the key questions.
That wasn't the goal of the exercise. So how is that a key question?
The exercise was: Demonstrate WEB and literate programming with this particular problem. Knuth did that. The question, then, is whether the method demonstrated literate programming and WEB.
If you want to know whether the approach or tools work for creating reusable code and less monolothic code, then that question should be posed and a new exercise performed. The question shouldn't be posed to an exercise where that wasn't a concern, it's dishonest.
We have brainwashed ourselves with dogmatic theory. I see nothing wrong with the code. This code is not for business production. It works well for what it is. Un-brainwash yourself! It's great that Knuth doesn't have to apply for a job and go through the gauntlet because that would make Knuth un-Knuth!
There's a careful balance when using tools and libraries. It's obvious that they are a good choice sometimes, but I've been surprised at the number of times where a tool/library that looks like a perfect fit is actually not, and the whole problem needs to be reconsidered and I end up writing a lot of original code.
The premise was that piping together shell commands was “better engineering” than a computer program that captured the authors thoughts? Which is more robust for debugging, producing diagnostics, error handling and reporting, extending, and code reuse?
Laughable and sad at the same time, because ACM would publish that.
> In fact, one of the reasons I sometimes prefer using Perl over Python is that it's very easy to incorporate into modular Unix tool pipelines. In contrast, Python encourages the creation of monoliths of the type McIlroy criticized.
Python's Mario tool makes it easy to use Python code in pipelines.
It's my comment you're referring to (linked from the post). The full sentence was “BTW, TeX's error handling is phenomenally good IMO; the opposite of the situation with LaTeX.” You've reversed the meaning(!) but I stand by my original comment: I invite you to try plain TeX (instead of LaTeX) for a few weeks/months, and see how you feel about the way it handles errors.
Unlike LaTeX, where the (TeX) error messages usually appear arbitrary / incomprehensible / unrelated to what you're doing, in TeX (IMO) all the error messages are very informative and include a lot of information and give you ways to recover from your problem and poke around, get more context, etc. First you'll have to have read a manual (or I recommend A Beginner's Book of TeX by Seroul and Levy), but my claim is about the user experience in the steady state.
Of course, part of the reason is that LaTeX is much more complicated than the low-level things one may be doing with plain TeX. Another reason is that the LaTeX authors were working with severe constraints, one of which was of their own choosing: their choice of using TeX macros as a “programming language” (which it was never intended to be, and at which is it horrible). Nevertheless, a big part of the reason is that they were trying so hard to make things "easy" for the user in the typical case that they didn't care as much about ways in which things can go wrong and how surprising errors can be.
As someone who uses both languages extensively, I disagree.
You are right that Python is great for writing small tools that you can run, just like Perl.
But Python does not lend itself to writing them inline in a command line like was done here. Perl not only does, but has a number of useful features specifically added to fit this common use case. 3 of which were used in this example. (-a for autosplit, -M to load a module, and -e to have the code passed as an argument on the command line rather than having to have it saved to a file.)
Secondly, Perl lends itself to being used as a "better shell" while Python does not.
What I mean is that anything that can be written in bash can be trivially rewritten in Perl, and the program that you get tends to be substantially more maintainable if the bash script is at all complex. In such a rewrite there usually isn't a good reason to change the structure of the program and make it into a single Perl program.
By contrast Python has focused on the "One True Way" to do things, and the plumbing work for calling external commands is just verbose enough that a Python rewrite of a bash script is not necessarily better than the bash script. And furthermore it is much more likely that the Python rewrite of the bash script is much better rewritten as a Python script.
The result is that for someone who lives on the Unix command line, Perl integrates into their world better than Python does. If you have never lived on the Unix command line, the objections may sound silly. But spend months typing commands and doing the extra steps that Python requires Every Single Time will get old.
(This is historically not surprising. Perl 1 was focused on generating text reports. Perl 2 moved into being a sysadmin tool. Perl wound up as a web language because it is what all of the sysadmins recommended for text manipulation to people writing early CGI scripts.)
Perl's big win for one-liners is braces syntax. Interestingly, there are already projects to add braces syntax[1].
Two of the three Perl features are also Python features, namely -m to run a module and -c to run code on the command line.
Regarding Perl being a better shell, there are modules like `doit` and `invoke` that make Python far better than perl for managing jobs, precisely because they make forking off jobs super easy.
But now that you mention it... I want to write a module to make python one-liners easy.
Yeah, perl borrows backticks from bash[3], so it's giving you syntax to do it directly, and it's long had strong support for opening a process using a very intuitive syntax.
Python's subprocess module works quite well, but gets extremely verbose[4] as you try to do anything more complex than "run a command and get the output" and has some nasty gotchas[2].
I forget the invoke syntax, but doit[1] is basically a make replacement so calling the shell is pretty easy:
I'm an experienced Perl user, but I'm not as familiar with Python. In addition, I'm not really using Perl for sysadmin stuff, so I tend to try to keep stuff "within" Perl. As an example, I'd rather use the File::Find module than use backticks to invoke `find`. This has really nothing to do with functionality - I'm almost always on Linux, and the syntaxes are similarly hairy - it's just that usually you get more powerful functionality using the Perl functionality.
I use a few different languages, one of which is Python, and I use the command line a lot, and I agree that Python is too verbose for a lot of the things that I do on the command line. Therefore, Python is not something that I reach for when doing simple tasks involving pipelines and/or file operations.
I have not yet put time into learning Perl. In no small part because I was intimidated by the weirdness of some of the Perl code that I've seen. The terseness that Perl allows, and which I desire, is at once compelling and scary at the same time. For this Perl also has earned the reputation that it "Write Once, Read Never".
But let's assume that I overcome my fear of Perl. Which version of Perl would you recommend that I learn? Perl 5 or the language formerly known as Perl 6?
Perl5. Some version of perl5 is available by default on just about everything and can be counted on in a similar manner as counting on awk to be there for you.
I'm not going to defend Perl's readability, there are many opus's online in both directions. Suffice it to say that Perl is still a really good tool for certain $jobs.
For learning Perl, I used this: https://qntm.org/perl_en followed by some trial and error, followed by the book Modern Perl, then Higher Order Perl. glhf! Perl hacking is a blast
That's not exactly fair to Raku. A more fair critique (and keeping with the theme of this thread) is that Raku is less focused on integrating with the Unix command line than it is on tool building putting it closer to Python than Perl(5) in the spectrum of things. This was a specific design influence dating back to the first days of Perl 6, so it makes some sense.
I know that Raku supporters disagree with me, but that has been my considered opinion for several years. And this has been something I put a lot of thought into.
Let me lay out the case.
What are the key ideas invented or promoted in Perl 6 / Raku that people get excited about?
- Object-oriented programming including generics, roles and multiple dispatch
- Functional programming primitives, lazy and eager list evaluation, junctions, autothreading and hyperoperators (vector operators)
- Parallelism, concurrency, and asynchrony including multi-core support
- Definable grammars for pattern matching and generalized string processing
- Optional and gradual typing
I got this list from https://www.raku.org/. It is what Raku people think is interesting about their own language. (So I don't get to bring up things I really don't like, like twigils.)
Some of these ideas are mainstream, some not. According to Tiobe (yes, not to be taken seriously but it is accurate enough), the top languages today are Java, C, Python, C++ and C#. Let's eliminate from the list of Raku features anything that is supported by at least 2 of them to come up with things that are novel in Raku while not being broadly adopted today. The list gets much shorter.
- Roles (OO programming)
- Junctions, autothreading and hyperoperators (functional programming).
- Definable grammars for pattern matching and generalized string processing
- Optional and gradual typing
How many of these will be widely adopted by top languages in 25 years? My best estimate is 1. Could be none, could be 2, I'd be shocked if there were 3.
I say opinion, but it is a fairly well educated opinion. Here is my argument about each.
- Roles. They have been around for some years. The only language where I have seen them used heavily is Perl 5. Nobody else seems excited.
- Junctions are mostly a bit of syntax around any/all which is pretty convenient already. Autothreading and hyperoperators are a cool sounding way to parallelize stuff, but getting good parallel performance is complex and counterintuitive. I don't think that this is a good approach.
- Definable grammars are an interesting rethinking of regexes, but parsing is a difficult and specialized problem. I don't see an interesting approach in an unpopular language changing how the world tackles it.
- Optional and gradual typing sounded great when it made it into the Common Lisp standard. But over 30 years later, only Python supports it of the top 5. And it isn't widely used there. I see nothing about the next 25 years that makes it more compelling than in the last 25. (Though Raku's implementation is far, far better than Perl 5's broken prototype system. But that is damning with faint praise.)
So use Raku if you find it fun. You'll get a view into an alternate universe of might have beens. But I still believe that the ideas that are new to you won't be particularly relevant to the future of programming.
-----
It is hard at this date to make what a similar list would have been for Perl 5 at a similar stage. People were excited about CPAN. Perl people kind of took TAP unit testing for granted and didn't appreciate exactly how important it was. Perl people liked the improvements in regular expressions but probably couldn't have guessed how influential "perl compatible regular expressions" would become across languages. Ideas we were excited about like "taint mode" went approximately nowhere. And some ideas that Perl helped popularize, like closures, were ones that few Perl programmers realized were actually supported by the language.
However it would be a true shocker if Raku was anywhere near as influential on the programming landscape 25 years from now.
> Junctions are mostly a bit of syntax around any/all
A quick look at Raku junctions makes me think they're basically a slightly tarted-up version of Icon's generators and goal-directed execution (which is no bad thing, of course but hardly novel.)
Junctions autothread. What does that mean? Using a Junction as an ordinary value, will cause execution for each of the eigenstates, and result in another Junction with the result of each of the executions. An example:
# a simple sub showing its arg, returning twice the value
sub foo($bar) { say $bar; 2 * $bar }
# a simple "or" junction
my $a = 1 | 2 | 3;
say foo($a); # executes for each of eigenstates
# 1
# 2
# 3
# any(2, 4, 6)
So are you telling me that you haven't used any of those?
---
Roles combine all of those features very simply.
role Interface {
method hello-world ( --> Str ) {...}
# the ... means it needs to be implemented by consuming class
}
role Abstract {
has Str $.name is required;
# adds an accessor method of the same name
method greet ( --> Str ) {...}
}
role Template[ Real ::Type, Type \initial-value ] {
has Type $!value = initial-value;
method get ( --> Type ) {
$!value
}
method set ( Type \new-value ) {
$!value = new-value
}
}
my $value = 42 but anon role Mixin {
method Str ( --> 'Life, the Universe and Everything' ){
}
}
Roles were heavily influenced by Smalltalk traits. Rather than being limited to those uses, Roles were expanded to include all of those other
use-cases as well.
---
Really Roles are a better method of code-reuse than inheritance.
role Animal {
method species ( --> Str ){...}
method produces-egg ( --> Bool ){...}
}
role Mammal does Animal {
method produces-egg ( --> False ){
# most mammals do not produce eggs.
}
}
role Can-Fly {
method flap-wings ( --> 'flap flap' ){
}
}
class Bat does Mammal does Can-Fly {
method species ( --> 'Bat' ){
}
}
class Bird does Animal does Can-Fly {
method species ( --> 'Bird' ){
}
method produces-egg ( --> True ) {
}
}
class Platypus does Mammal {
method species ( --> 'Platypus' ){
}
method produces-egg ( --> True ) {
# override Mammal.produces-egg()
}
}
Of course a simple example doesn't do this ability justice. It really shines on large code-bases.
That is of course about roles in Perl, which doesn't have all the same features. All of the points do apply to Raku roles though.
---
Raku has so many good ideas it would be a waste if other languages didn't copy at least some of them. I of course can understand if a single language doesn't want to copy all of them at the same time.
It would definitely be a waste if no other language tries to combine regular expressions, parsers, and objects like Raku grammars have done.
At the very least Raku regular expressions are easier to understand than Perl compatible regular expressions. (Note that I very much DO understand PCRE syntax, having used it heavily in Perl for many years.)
I am willing to bet you $1000 that in 2030 there are more jobs on general $job_board of your choice that mention Perl than Raku.
We can say 2040 if you think that 2030 may be too soon for Raku to have any chance at all. But there is a good chance that one of us will be dead by then. (I'll be 70, I think you'll be in your 80s.)
My point being that if you yourself are not confident enough to take some bet of that form, you cannot expect people to take you seriously when you describe Raku as "where the puck will be". Particularly not people who are happy to explain why they think that Raku won't do that, and are willing to make bets of that kind.
Bet taken. With the current rate of Perl's decline, I think that's a very safe bet to take.
Awesome. I know how hard it is to get rid of legacy code, and there are a lot more startups that I'm aware of starting with Perl today than starting with Raku.
I know I'm an old hack, but not that old: I'll be 73 for 98% of 2030.
The "then" that I was referring to in that sentence was 20 years from now. Which is 2040. As I said, you'll be in your 80s at that point.
Ruby is the best version of Perl. You can do command line one-liners and full-on object-oriented (SmallTalk object model) readable, maintainable programs.
My impression 20 years ago was that Ruby is an interesting mix between Perl and Python. In principle there is little that differentiates Perl and Ruby in terms of how maintainable or not their code can be.
However the Ruby ecosystem wound up with a lot of modules contributed by people who had just moved to it from languages like Java. They overreacted to their new found freedom. The result is that between a poor testing story and questionable practices like "monkeypatching" (literally modules overwriting random methods in other modules) the Ruby ecosystem wound up with a lot of nasty gotchas. (There is a lot of, "f you load module A then B, it works, but module B then A and it doesn't.")
Yes, Ruby programmers get up in arms when you say that they have a poor testing story. But ask them whether by default they have actually run unit tests for everything installed on their system, and they have not. Ask them if they could run unit tests and they think they can. But those who I have watched try have found out the hard way how many unit tests were only written for the original author to run in the original environment, and can't easily be run in an automated way. By contrast the default for CPAN is that every module has had its unit tests run on every system it is installed in, and automated smoke tests ensure that modules have had their tests run on a wide variety of operating systems and versions of Perl.
The result is that random Ruby module X is generally less likely to be dependable than random Perl module Y. Which in turn means that in my experience significant Ruby code bases written by competent programmers top out at smaller than Perl, with worse maintenance stories.
That doesn't discount the fact that there has been a tremendous amount of unmaintainable Perl written by incompetent programmers. (Particularly during the wild dot com days.) But "maintainable" is NOT something that Ruby has a good story to tell about.
Ruby is about equal to Perl as a language for interactive command line usage, and both are better than Python.
Comparing RubyGems to CPAN, CPAN is about 2x as large, has a better infrastructure, better testing, and is generally better.
Comparing CPAN to PyPI (the Python version), they are about the same size, PyPI has a worse testing story, has more up to date modules, is growing faster and seems to be of similar quality. If you want write a system that integrates with a recent standard, support from Google, or to use something like machine learning, Python is the clear winner.
Adding JavaScript, I consider node.js to have the worst command line story, worst repository system, but it is extremely popular.
I personally use Perl for command line stuff and Python otherwise. I use JavaScript when I have to (and sadly I have to a lot). It is rare for me to bother with Ruby. But I learned Perl first, and have written more in Perl than the others combined.
It's all fine and dandy to say "look at these awesome tools that make tasks like these trivially easy. See how powerful Unix is". But this fails to consider that somebody, somewhere, has to be a tool writer, not just a tool user. Knuth's code was a tool writer's code, exemplifying a technique (Literate Programming) that is aimed at long form code writers in general.
As with others before, the author fails to grasp that this is an apples to oranges comparison.