• Though originally Knuth created TeX for books rather than single-page articles, he's most familiar with this tool so it's unsurprising that he'd use it to just type something out. (I remember reading somewhere that Joel Spolsky, who was PM on Excel, used Excel for everything.)
• To create the PDF, where most modern TeX users might just use pdftex, he seems to first created a DVI file with tex (see the PDF's title “huang.dvi”), then gone via dvips (version 5.98, from 2009) to convert to PostScript, then (perhaps on another computer?) “Acrobat Distiller 19.0 (Macintosh)” to go from PS to PDF.
• If you find it different from the “typical” paper typeset with LaTeX, remember that Knuth doesn't use LaTeX; this is typeset in plain TeX. :-) Unlike LaTeX which aims to be a “document preparation system” with “logical”/“structured” (“semantic”) markup rather than visual formatting, for Knuth TeX is just a tool; typically he works with pencil and paper and uses a computer/TeX only for the final typesetting, where all he needs is to control the formatting.
• Despite being typeset with TeX which is supposed to produce beautiful results, the document may appear very poor on your computer screen (at least it did when I first viewed it on a Linux desktop; on a Mac laptop with Retina display it looks much better though somewhat “light”). But if you zoom in quite a bit, or print it, it looks great. The reason is that Knuth uses bitmap (raster) fonts, not vector fonts like the rest of the world. Once bitten by “advances” in font technology (his original motivation to create TeX & METAFONT), he now prefers to use bitmap fonts and completely specify the appearance (when printed/viewed on a sufficiently high-resolution device anyway), rather than use vector fonts where the precise rasterization is up to the PDF viewer.
• An extension of the same point: everything in his workflow is optimized for print, not onscreen rendering. For instance, the PDF title is left as “huang.dvi” (because no one can look at it when printed), the characters are not copyable, etc. (All these problems are fixable with TeX too these days.)
• Note what Knuth has done here: he's taken a published paper, understood it well, thought hard about it, and come up with (what he feels is) the “best” way to present this result. This has been his primary activity all his life, with The Art of Computer Programming, etc. Every page of TAOCP is full of results from the research literature that Knuth has often understood better than even the original authors, and presented in a great and uniform style — those who say TAOCP is hard to read or boring(!) just have to compare against the original papers to understand Knuth's achievement. He's basically “digested” the entire literature, passed it through his personal interestingness filter, and presented it an engaging style with enthusiasm to explain and share.
> when Knuth won the Kyoto Prize after TAOCP Volume 3, there was a faculty reception at Stanford. McCarthy congratulated Knuth and said, "You must have read 500 papers before writing it." Knuth answered, "Actually, it was 5,000." Ever since, I look at TAOCP and consider that each page is the witty and insightful synthesis of ten scholarly papers, with added Knuth insights and inventions.
• I remember a lunchtime conversation with some colleagues at work a few years ago, where the topic of the Turing Award came up. Someone mentioned that Knuth won the Turing Award for writing (3 volumes of) TAOCP, and the other person did not find it plausible, and said something like “The Turing Award is not given for writing textbooks; it's given for doing important research...” — but in fact Knuth did receive the award for writing TAOCP; writing and summarizing other people's work is his way of doing research, advancing the field by unifying many disparate ideas and extending them. When he invented the Knuth-Morris-Pratt algorithm in his mind he was “merely” applying Cook's theorem on automata to a special case, when he invented LR parsing he was “merely” summarizing various approaches he had collected for writing his book on compilers, etc. Even his recent volumes/fascicles of TAOCP are breaking new ground (e.g. currently simply trying to write about Dancing Links as well as he can, he's coming up with applying it to min-cost exact covers, etc.
Looks like no one complained about the long comment, so some further trivia I omitted mentioning:
• The problem that one cannot copy text from a PDF created via dvips and using METAFONT-generated bitmap fonts has recently been fixed — the original author of dvips, Tomas Rokicki ([1], [2]) has “come out of retirement” (as far as this program is concerned anyway) to fix this and is giving a talk about it next week at the TeX Users Group conference ([3], [4]):
> Admittedly this is a rare path these days; most people are using pdfTeX or using Type 1 fonts with dvips, but at least one prominent user continues to use bitmap fonts.
So in the future (when/if Knuth upgrades!) his PDFs too will be searchable. :-)
• In some sense, even Knuth's work on TeX and METAFONT can be seen as an extension of his drive to understand and explain (in his own unique way) others' work: at one point, suddenly having to worry about the appearance of his books, he took the time to learn intensively about typesetting and font design, then experiment and encode whatever he had learned into programs of production quality (given constraints of the time). This is in keeping with his philosophy: “Science is what we understand well enough to explain to a computer. Art is everything else we do.” and (paraphrasing from a few mentions like [5] and [6]) “The best way to understand something is to teach it to a computer”.
• Finally returning (somewhat) to the topic, and looking at the 2/3rds-page proof that Knuth posted [7], one may ask, is it really any “better”, or “simpler”, than Huang's original proof [8]? After all, Huang's proof is already very short: just about a page and a half, for a major open problem for 30 years; see the recent Quanta article ([9], HN discussion [10]). And by using Cauchy’s Interlace Theorem, graph terminology, and eigenvalues, it puts the theorem in context and (to researchers in the field) a “natural” setting, compared to Knuth's proof that cuts through all that and keeps only the unavoidable bare essentials. This is a valid objection; my response to that would be: different readers are different, and there are surely some readers to whom a proof that does not even involve eigenvalues is really more accessible. A personal story: in grad school I “learned” the simplex algorithm for linear programming. Actually I never quite learned it, and couldn't answer basic questions about it. Then more recently I discovered Knuth's “literate program” implementing and explaining the simplex algorithm [11], and that one I understood much better.
> The famous simplex procedure is subtle yet not difficult to fathom, even when we are careful to avoid infinite loops. But I always tend to forget the details a short time after seeing them explained in a book. Therefore I will try here to present the algorithm in my own favorite way—which tends to be algebraic and combinatoric rather than geometric—in hopes that the ideas will then be forever memorable, at least in my own mind.
I can relate: although the simplex algorithm has an elegant geometrical interpretation about what happens when it does pivoting etc., and this is the way one “ought” to think about it, somehow I am more comfortable with symbol-pushing, having an underdeveloped intuition for geometry and better intuition for computational processes (algorithms). Reading Knuth's exposition, which may seem pointless to someone more comfortable with the geometrical presentation, “clicked” for me in a way nothing had before.
This is one reason I am so fascinated by the work of Don Knuth: though I cannot hope to compare myself in either ability (even his exploits as a college kid are legendary [12]) or productivity or taste, I can relate to some of his aesthetic preferences such as for certain areas/styles of mathematics/programming over others, and being able to so well “relate” to someone this way gives you hope that maybe by adopting some of the same habits that worked for them (e.g.: somewhere, Knuth mentions that tries to start every day by doing whichever thing he's been dreading the most), you'll be able to move a few steps in somewhat the same direction, and if nothing else, this puts me in mind of what Bhavabhuti said many centuries ago [13] about finding someone with the same spirit, so to speak.
> The problem that one cannot copy text from a PDF created via dvips and using METAFONT-generated bitmap fonts has recently been fixed — the original author of dvips, Tomas Rokicki ([1], [2]) has “come out of retirement” (as far as this program is concerned anyway) to fix this and is giving a talk about it next week at the TeX Users Group conference ([3], [4])
Hope that will be filmed and put online, sounds like an intriguing talk to watch!
Wonderful insight into an interesting man. I’ve never understood why he’s spent so much time writing the Art of Computer Programming, but framing it as a deep inclination to explain and summarize the work of others in a beautiful way is such a wonderful perspective I’ve never heard anyone say before.
In 1993, Knuth requested of Stanford University (who granted his request) that he be allowed to retire early (and take up the title of “Professor Emeritus of The Art of Computer Programming”), so that he could complete TAOCP, which he regards as his life's work: https://cs.stanford.edu/~knuth/retd.html
Needless to say, the measly royalties from a technical book (especially one that is never going to be assigned as a textbook to large classes of undergraduate students) would be tiny in comparison to the Stanford professor salary that he gave up.
Also, if he were doing it as a source of income, he'd probably actually publish the long-awaited volumes so that they can sell and make money, rather than spend decades of full-time labour polishing each one (Vol 3 first came out in 1973; Vol 4A in 2011; Vol 4B is only about 1/3rd done), continually missing estimates, until they were at his desired level of quality.
Insisting on reducing every motivation to whether or not it makes money is such a narrow-minded display of being unable to imagine other ways of viewing the world that it's almost insulting.
My impression is that generally plain TeX is used by people who don't like all the LaTeX macro complexity and like to keep things simple (not easy), like to understand better what's going on, build their own tools, etc. In return they give up the ability of using the abundant number of LaTeX packages and standard formatting options, its decades-of-experience code, etc. If you want concrete numbers for a sense of things: on the https://tex.stackexchange.com website, there are currently 179,492 questions (https://tex.stackexchange.com/search?q=is%3Aquestion) and of those 590 have been tagged plain-tex (https://tex.stackexchange.com/search?q=is%3Aquestion+%5Bplai...). Of course these numbers shouldn't be taken too seriously as there are various selection biases in what kinds of people would use plain TeX versus how many of them would ask questions on a site like that, etc.
Interesting, it changes my understanding of tex a bit, as it's not "only" a building block (despite being mainly used as such). Time to follow those links and do some exploring!
Thanks again for the nice answer, people sharing like you did make HN very enjoyable
Nobody uses TeX by itself; this document is written with “plain TeX”, which is not just TeX, but a TeX package similar to LaTeX, though more presentational. I rarely see things written with plain TeX nowadays, or even with ConTeXt.
Circa 2009 I worked on an application that would render maths tests for teachers using plain TeX. I'm sure there are a few other wild examples (my CV is in eplain TeX).
I was wondering why his books look bad on a computer screen. Thank you very much. I was reading Concrete Mathematics and kept being annoyed by the quality.
I don't think that's relevant or true about "his books" in general, as (for instance) the digital versions of TAOCP look great (and use vector fonts). Obviously the publishers will use their own workflow; in that case it's not as if Knuth prepared a PDF for you to download (as he did here). Maybe the publishers just botched up the electronic version of Concrete Mathematics (as the reviews say), or you bought a copy that's defective somehow.
I did a Knuth research binge just last week, carried along by a wave of awe and admiration, so it was good to read your comment. I knew he stuck with TeX but would have missed that .dvi filetype you mentioned.
Thank you for writing this out. Very interesting read! I very much agree with you that writing textbooks, digesting and refactoring other authors work, is often as important as the original work itself. Indeed, what is the value of a good idea if you cannot explain it to anyone?
Thanks. Comments like this that go into the technical details, the history of the details and the people behind the technology are one of the main reasons I frequent hacker news.
Thanks for the trivia, although your comment on displays is a bit misleading:
>I first viewed it on a Linux desktop; on a Mac laptop with Retina display it looks much better though somewhat “light”
There's nothing special about Apple displays that makes them "retina", it's just a (marketing) term they use that denotes a standard for pixel density at a specific distance away from the screen. The point I'm trying to make here is you could have just as easily had a "retina" display on your linux machine, if not better.
Yes you're right that Linux or Mac is irrelevant (and I should have omitted mentioning the OSes); my point was just that on a typical desktop monitor (28-inch monitor at 2560x1440, so about 105 dpi) and at typical zoom level, the effects of rasterization are very much apparent (and still would be on a so-called “4k” monitor), while on a typical reasonably-good laptop display (221 dpi), it's only slightly better. Basically, computer screens have still much lower resolution than a typical office printer (600 dpi), let alone the high-resolution typesetters (5000+ dpi! https://news.ycombinator.com/item?id=20009875) that TeX/METAFONT were designed for.
One sometimes hears that screens are now high-resolution enough that the usual tricks done for font rendering (http://rastertragedy.com/) — anti-aliasing, sub-pixel rendering, etc — are no longer needed, but in fact we still have a long way to go (if we'll ever get there) and these PDFs that use bitmap fonts serve as a good test case.
I met Knuth a few months ago helping my wife do portrait photography of him (I got to hold lighting/reflectors :) ), and got to chat with him on machine learning and other research computer science results. I was floored at the sheer amount of papers and author names he could recall on the fly, he had already read every citation I had. At his age, his mind is as sharp as a tack, and watching him code and demo some stuff to me, he was incredibly adept in his programming environment, far more productive than I would be. I really hope he can finish all of the volumes of his books, it will truly be a gift to humanity.
Here’s a little story of math at 35,000 feet. […] I wrote to Don and sent him a few more related results that Neil and I had gotten […]. As a result, I got a letter from him that I found to very moving indeed. Of course it was in his familiar pencilled scrawl, written at the bottom of the note that I had sent him. It began as follows:
> > Dear Herb, I am writing this on an airplane while flying to Austin to celebrate Dijkstra’s 70th birthday. This whole subject is beautiful and still ripe for research (after more than 150 years of progress!), so I will tell you what I know about it (or can recall while on a plane) in hopes that it will be useful.
> There followed four pages of tightly handwritten information that was a gold mine of the history of the sequence b(n) and related matters. It contained references to de Rham, Dijkstra, Carlitz, Neil Sloane, Stern (of Stern-Brocot trees fame), Eisenstein, Conway, Zagier, Schönhage, Hardy, Ramanujan, and so forth. It did not merely mention the names. It contained precise statements of the results of interest, of their relationship with the result of Calkin and myself and with each other. It contained a number of Don’s own observations about related questions, and had several conjectures, with background, to offer.
> It was a letter, ladies and gentlemen, that was written by a man who very much loves his subject and the history of his subject, and who knows more about that than any five other human beings that you could name. His enthusiasm, his background knowledge, and his mathematical values, that impelled him to write such a letter under such conditions, can be taken as examples for the rest of us to aspire to.
I met him years ago and had the same reaction. Even when he was relaxed, his attention seemed perfectly focused. His general appearance looked awesome. I don't mean well-dressed, I mean that his eyes and facial expression made him seem to have a brain the size of a small galaxy.
Congratulations Scott, you are one of the very few people to have Knuth make a comment on your blog. Printing out the comment and framing it on your wall may be very appropriate. ;-)
(I know he gives out reward checks, but this is the first time I've seen such a comment. I wonder if anyone knows of any others.)
I find it interesting too in so far as it references Math Overflow...
> Yet at that time, I was hoping for developing something more general using the eigenspace decomposition of the adjacency matrix, like in this unanswered MO question:
There's some serious stuff on Math Overflow and the Mathematics stack exchange (https://math.stackexchange.com/). I haven't figured out what the difference between the two sites is however in focus
The line can be blurry sometimes, but it's roughly the difference between research and teaching. MathOverflow is meant to help mathematics researchers. At its best, it's like being able to walk over to a colleague's room in a prestigious math department to ask for insight. Math.SE, on the other hand, is for "people studying math at any level and professionals in related fields" and should be used for questions that didn't originate in research, such as hard textbook exercises. Both are super interesting, and even mathematicians will have plenty of questions that more suitable for Math.SE than MathOverflow.
[SE.Math](https://math.stackexchange.com/) is for pretty much any sort of math question you might have. At least for traditional mathematics -- statistics, numerical algorithms, data science, AI, etc., tend to better fit on other StackExchange sites.
[MathOverflow](https://mathoverflow.net/) is the sorts of math questions that will probably sound like gibberish to anyone who doesn't have a graduate-level education in Math. And I mean specifically in Math, not in Engineering or Physics.
Usually, if you want to ask a math question -- even a reasonably complex one that someone would need a lot of education to understand -- it'll go on SE.Math. MathOverflow's more of a specialty site for mathematicians.
To note it, the distinction between SE.Math and MathOverflow has at least two parallels:
In all three cases, one site's more appropriate for most users, ranging from beginners to experts, while the other site's more specifically for research-level topics in the field.
Cryptography really needs this. Both the Stack Exchange and the Reddit for crypto will typically have at the top of the pile:
1 x post from a cryptocurrency newbie, happy to find an audience who undoubtedly want to buy their FlappyBirdCoins. HODL HODL! We're going to the sky! Actually this post _is_ appropriate and it IS on topic and you're only saying it isn't because you don't understand digital currency is the future. (Grr)
2 x post from people who've read about the One Time Pad and are now convinced their grade school cipher is better and here's 40 characters of output - bet you can't break it. No I didn't read the instructions saying not to do this, I'm too smart to need to read instructions.
1 x post from somebody with a basic grasp who has a real but fairly trivial question, like, why is the exponent 65537 used for RSA ? This question may or may not just be their homework.
1 x post that is a good question but shouldn't be filed here, like, how do I configure Apache to enable TLS 1.3 ?
1 x post which is actually appropriate, like why does this algorithm say to use a pseudo-prime P when it only seems to be important that A isn't a factor of P ? Doesn't this 1987 journal paper I found show that it would work for other values just fine ?
"2013-2018: I revisited this conjecture every time when I learn a new tool, without any success though. But at least thinking about it helps me quickly fall asleep many nights."
I believe Feynmann said the best people always had the top unsolved problems in the top of their minds and whenever a new technique or result arrived, they would drop everything and try out the new attack on the old problem.
I'm kind of surprised it helped him fall asleep. Whenever my brain starts to think about technical problems in the middle of the night I know I have a few hours of wakeful puzzling ahead of me
I had a friend at university who studied mathematics with me (I studied both maths and computer science, but he was a genuine mathematician, and a pretty good one, much better than me), and he told me that he often takes problems "to bed". After that he would often wake up with at least a half solution in his head. Sometimes a full solution.
Ramanujan was like that too. He believed that a goddess sends him solutions in his sleep. Not sure if the goddess is real, but the formulas certainly were real.
I find reading technical work makes my brain fall asleep very quickly. I am sure if I read Knuth's books at night I would be asleep within fifteen minutes.
I am not sure thinking about technical work would help sleep, because thinking could be orthogonal to reading.
Oh, I'm with you regarding technical reading being a good way to fall asleep! But reading someone else's train of thought is a different way to engage the brain than trying to figure out a mathematical proof on your own - that's a lot more like what pondering over programming problems would be like for us I think
This speaks to very interesting difference in biological constitution. Like the two comments below, I find my mind working harder asleep than awake. More than once I went to bed with a problem and wake up either with a solution, or with a much better perspective on the problem.
The gargantuan intellectual effort behind this is incredibly humbling. I can hardly imagine the breadth of knowledge one needs to be able to keep in their mind to tackle this problem.
Here's a quote from Richard Bellman, from his paper on the birth of dynamic programming
<<<What is worth noting about the foregoing development
is that I should have seen the application of dynamic pro-
gramming to control theory several years before. I should
have, but I didn’t. It is very well to start a lecture by saying,
‘Clearly, a control process can be regarded as a multistage
decision process in which???,’ but it is a bit misleading.
Scientific developments can always be made logical and
rational with sufficient hindsight. It is amazing, however,
how clouded the crystal ball looks beforehand. We all wear
such intellectual blinders and make such inexplicable blun-
ders that it is amazing that any progress is made at all>>>
In simpler, personal terms, "oh you bloody idiot, why didn't you XYZ...?"
I was considering a problem related to a new data structure. There was a part that was plain ugly, just stank. I thought I could do better, so expressed the problem as a programmer might, and looked for regularity. There was a lot of regularity, so there was an underlying structure which I could grasp at, but not reach. Grab, grab, grab. It was like mist.
I realised it involved the edges of an n-dimensional cube. I could see the 2D case, it's trivial, so what did I do? Well I needed to generalise it so I went for the 4D cube. I spent literally weeks muttering to myself like a nutter, half the time wandering like a zombie.
And eventually it clicked. I may, just may have found something new and genuinely useful in combinatorics but that's not the point.
The point is, had I gone from the 2D case to the 3D case the structure I was looking for would have dropped out in an afternoon. Weeks wasted. It was so simple, you bloody idiot, why didn't you...
I'm not a mathematician, but I guess a lot of maths is like that.
(Edit: quoted Bellman extract to delineate it properly)
The proof of Fermats last theorem did not involve many areas of mathematics. It was primarily proved via algebraic geometry.
That’s also not true about wiles productivity... he first discovered the proof in 93 but then when he talked to people about it an error was found. He eventually fixed the error in 94 and then in 95 it was published. No mathematician waits to publish the proof of one of the greatest unsolved problems in mathematics.
Talk about a coincidence. I just interviewed (for an article) someone today who happened to mention out of the blue that he's mentioned in the "Art of Computer Programming". He asked me if I "had heard of Donald Knuth." He didn't know if people still read those volumes. :) I let him know folks are very much still interested.
This is especially interesting to think about in light of the whole school of self-teaching Knuth has pushed out into the world in fragmented pieces here and there.
One of the fallbacks is to switch from justified to right aligned text for that one line. Most importantly, the compiler issues a warning so that you can look it up yourself.
I'm sorry, I still don't understand practical applications of that paper. Does that mean we now can write "smart" function with input of booleans and sensitivity and get results much faster than if we just iterate over booleans?
While I generally thing that looking for practical applications of research is nonsense, this one seems to be in an obviously useful area.
It's all about different ways of measuring how good a function is at scrambling the input data. I'm guessing if you want to break (or make) a hash or cryptosystem you will use these measures over various aspects of it to look for weaknesses or some such.
This particular proof seems to be saying that the measure called sensitivity will give you similar answer to a bunch of other measures.
On the one hand, that's disappointing (a measure that gave totally different results might enlighten whole new ways of attacking/strengthening you crypto). On the other hand it is encouraging because if a whole bunch of very different measures agree, then that's a sign that they are on to something real.
This was a result that everyone expected to be true, but weren't able to prove: our mathematical tools weren't good enough. Now they are. It's like weightlifting: as your mathematics becomes “stronger”, it becomes capable of doing more things. The applications will come. See this Quanta magazine article and quotes in it from researchers about the problem: https://www.quantamagazine.org/mathematician-solves-computer...
> But no one could prove it. […] “People wrote long, complicated papers trying to make the tiniest progress,”[…] this power should yield new insights about complexity measures. “It adds to our toolkit for maybe trying to answer other questions in the analysis of Boolean functions,” […]
It's not my field, so there actually may be practical applications, but in general mathematicians don't really care directly about applications (unless they do). The result stands on its own merit (and may either help in applications later, or simply as a beautiful result, or may later be useful in the proof of some dependent theorem that does have applications).
But essentially a large part of why this is a big deal is that it was a "long"-unproven conjecture that smart people had looked at over the years and not solved, and expected to take heavier machinery, and that all of a sudden got a really simple proof.
I’d like to be corrected if I’m wrong, but this seems to be about arranging data to be better processed in quantum computing, to retrieve parity information. I wonder if this has application to McEliece codes, or future communications.
Not to innuade the value from his books; but I do have a question.
Are his books useful for someone who are into improving his daily-job-programming skills? I gave a look at the indexes of his books, they seems more relevant for when I was into algorithmic programming competitions.
I read all of the first three volumes and attempted every exercise. It took me a little over three years, and most of the focus was on very low-level stuff like random number generation, arbitrary precision math, sorting (of course) and memory management. The first book is half math and then a lot of the rest discusses his assembler language MIX. The middle third or so of volume 3 discusses how to optimally sort files that are stored on tape (!) that are too large to fit into main memory. So, no... rather than to say the books aren't _useful_ for a practicing programmer, I'd say that there are far, far more productive uses of your time. However, I really enjoyed reading them, and I'm glad I did, and I recommend them to everybody. I think they've made me a better programmer in the sense that I can appreciate what goes into the routines that I'm using, but I doubt I'll ever have cause to prove the runtime bounds of an algorithm like Knuth does.
I have peeked at the two new volumes (4a & 4b) and it looks like they spend a lot of time on graph algorithms, which _are_ contemporary and something you might apply directly. All of the examples are in MMIX (a more modern, but still imaginary, assembler dialect), so if you wanted to jump right into 4A, you'll have to learn MMIX first. You can download the MMIX description off of Knuth's website (https://www-cs-faculty.stanford.edu/~knuth/mmix.html) if you want to dip your toes into Knuth before committing to one of the books.
I had an MSCS when I started, but if you did a STEM undergrad, you should be fine - he actually explains everything you need. Don’t get me wrong: the problems are brain-bendingly difficult, but you don’t need to know how to do them before you start.
The answer to that question may come down to philosophy. Does it help your daily work to really understand things that are only slightly connected to what you are doing?
I've only read a tiny bit of TAOCP but I believe reading it would be quite useful in the long run. But if I were to read it, I wouldn't do it for the usefulness but rather because gathering knowledge and understanding is the right thing to do. There must be a million things that are more useful in the short term, and really reading TAOCP is bound to take years.
His books are very far removed from contemporary programming. That said, I would probably read them as a grad student in CS since they're a classic in the field (and Knuth is a funny writer). They introduce you to a lot of math, as well as a more theoretical approach to things like data structures and algos.
• Obviously, this is typeset with TeX.
• Though originally Knuth created TeX for books rather than single-page articles, he's most familiar with this tool so it's unsurprising that he'd use it to just type something out. (I remember reading somewhere that Joel Spolsky, who was PM on Excel, used Excel for everything.)
• To create the PDF, where most modern TeX users might just use pdftex, he seems to first created a DVI file with tex (see the PDF's title “huang.dvi”), then gone via dvips (version 5.98, from 2009) to convert to PostScript, then (perhaps on another computer?) “Acrobat Distiller 19.0 (Macintosh)” to go from PS to PDF.
• If you find it different from the “typical” paper typeset with LaTeX, remember that Knuth doesn't use LaTeX; this is typeset in plain TeX. :-) Unlike LaTeX which aims to be a “document preparation system” with “logical”/“structured” (“semantic”) markup rather than visual formatting, for Knuth TeX is just a tool; typically he works with pencil and paper and uses a computer/TeX only for the final typesetting, where all he needs is to control the formatting.
• Despite being typeset with TeX which is supposed to produce beautiful results, the document may appear very poor on your computer screen (at least it did when I first viewed it on a Linux desktop; on a Mac laptop with Retina display it looks much better though somewhat “light”). But if you zoom in quite a bit, or print it, it looks great. The reason is that Knuth uses bitmap (raster) fonts, not vector fonts like the rest of the world. Once bitten by “advances” in font technology (his original motivation to create TeX & METAFONT), he now prefers to use bitmap fonts and completely specify the appearance (when printed/viewed on a sufficiently high-resolution device anyway), rather than use vector fonts where the precise rasterization is up to the PDF viewer.
• An extension of the same point: everything in his workflow is optimized for print, not onscreen rendering. For instance, the PDF title is left as “huang.dvi” (because no one can look at it when printed), the characters are not copyable, etc. (All these problems are fixable with TeX too these days.)
• Note what Knuth has done here: he's taken a published paper, understood it well, thought hard about it, and come up with (what he feels is) the “best” way to present this result. This has been his primary activity all his life, with The Art of Computer Programming, etc. Every page of TAOCP is full of results from the research literature that Knuth has often understood better than even the original authors, and presented in a great and uniform style — those who say TAOCP is hard to read or boring(!) just have to compare against the original papers to understand Knuth's achievement. He's basically “digested” the entire literature, passed it through his personal interestingness filter, and presented it an engaging style with enthusiasm to explain and share.
> when Knuth won the Kyoto Prize after TAOCP Volume 3, there was a faculty reception at Stanford. McCarthy congratulated Knuth and said, "You must have read 500 papers before writing it." Knuth answered, "Actually, it was 5,000." Ever since, I look at TAOCP and consider that each page is the witty and insightful synthesis of ten scholarly papers, with added Knuth insights and inventions.
(https://blog.computationalcomplexity.org/2011/10/john-mccart...)
• I remember a lunchtime conversation with some colleagues at work a few years ago, where the topic of the Turing Award came up. Someone mentioned that Knuth won the Turing Award for writing (3 volumes of) TAOCP, and the other person did not find it plausible, and said something like “The Turing Award is not given for writing textbooks; it's given for doing important research...” — but in fact Knuth did receive the award for writing TAOCP; writing and summarizing other people's work is his way of doing research, advancing the field by unifying many disparate ideas and extending them. When he invented the Knuth-Morris-Pratt algorithm in his mind he was “merely” applying Cook's theorem on automata to a special case, when he invented LR parsing he was “merely” summarizing various approaches he had collected for writing his book on compilers, etc. Even his recent volumes/fascicles of TAOCP are breaking new ground (e.g. currently simply trying to write about Dancing Links as well as he can, he's coming up with applying it to min-cost exact covers, etc.
Sorry for long comment, got carried away :-)