The intermediate representation of the binary tree for QuickSort in this article was a “eureka” moment for me. I think algorithm and data structure curriculums should take this approach.
No kidding. This is very nice. I might be taking some liberties here, but the point about the shape of the output driving the shape of the program reminds me of using schema information in SQL (think of declared foreign keys) to generate queries: the schema tells you the shape of the data in more ways than just tables and columns, and this in turn tells you about the queries you might want.
If you enrich a relational schema with more metadata, then you can generate even more queries. Think of declaring the set of all relations that define authorization data, not just group memberships, but access controls that refer to groups, define ownership, and so on.
So, for example, 'give me all the authorization-relevant entities "to the right" of this entity' might be a query that, given enough schema metadata, can generate a much more complex SQL query. But that abstract query itself becomes an obvious query given the shape of the data (i.e., the schema). A more concrete example is "what can this user do?" and "who can do what to this resource?", "what groups is this user a member of?", "what are all the members of this group?" (note: these are not trivial if groups nest) which are instantiations of "what are all the entities to the {left, right} of this entity, using the set {name} of relations?".
In terms of SQL these are all JOINs or LEFT JOINs (depending on declared relation NULLability, or relation cardinalities), with RECURSIVE queries using UNION (not UNION ALL, to avoid infinite recursion in the presence of circular relations) for "nesting" relations (from a table to itself, perhaps crossing relations on other tables in the process). SQL is extremely powerful and expressive, but we can make these queries pithier and more obvious with a more abstract language that takes advantage of schema metadata. Indeed, graph databases are such a language.
SQL needs a richer schema definition language (extensions will do) by which to formally describe more of the shape of the data.
Completely agree. Maybe it’s because CS curricula tends to teach searching/sorting algorithms before data structures, specifically BSTs. Not sure why...
If you think about it a BST defines order and a sorting algorithm achieves it so naturally the output of a sort can be seen as a BST (flattening it to a list is somewhat irrelevant).
> Maybe it’s because CS curricula tends to teach searching/sorting algorithms before data structures, specifically BSTs.
I think it's more because sorting is all about "efficiency", and sorting things without allocating a ton of intermediate data structures is viewed as more efficient. The blog does mention the deforestation optimization, which (if implemented sufficiently heroically) would remove the intermediate tree in the QuickSort example. But now we're straying pretty far from a simple sorting algorithm in a first high-level algorithms course to a more complex and less abstract beast that is only maximally efficient if you assume a heroic compiler.
As a side note, heap sort (https://en.wikipedia.org/wiki/Heapsort) does perform sorting by constructing an intermediate heap data structure. It's just that the data structure is implicit as it is built in-place in the array to be sorted.
> If you think about it a BST defines order and a sorting algorithm achieves it so naturally the output of a sort can be seen as a BST (flattening it to a list is somewhat irrelevant).
> I think it's more because sorting is all about "efficiency", and sorting things without allocating a ton of intermediate data structures is viewed as more efficient.
I agree that probably explains a lot of it.
However the cost to efficiency is not that large and may be worth the gains in conceptual clarity. First, the traditional implementation of QuickSort has an implicit (or virtual?) intermediate data structure, the function call tree. Second, I have limited understanding of lazy evaluation, but in a lazy language I believe ‘flatten’ would begin consuming the tree as it is still being built by ‘build’.
> First, the traditional implementation of QuickSort has an implicit (or virtual?) intermediate data structure, the function call tree.
That's a good point, but function calls have a constant cost in terms of stack frame allocation and deallocation.
> Second, I have limited understanding of lazy evaluation, but in a lazy language I believe ‘flatten’ would begin consuming the tree as it is still being built by ‘build’.
Yes, that is correct. The entire tree is not allocated at once, only the nodes that are actually needed, as well as dynamic data representing suspended computations of the rest of the data structure. But (again, ignoring compiler optimizations that would make them go away) these are allocations on the garbage-collected heap and thus incur GC costs and complexity.
Overall I agree that there would be value in illustrating algorithms by also showing the version that constructs the computation tree. But I don't think it should be the one presentation of the algorithm, only a device to further understanding of the "real" implementation.
Yes, definitely. For example, the explanation of MergeSort in CLRS’s “Introduction to Algorithms” has a figure which shows tree a of lists being transformed into another shorter tree of lists where the lower nodes have been merged.
Funny how the author gave this talk in honor of Felleisen's on his 60th birthday. "Hi, to honor you I'll talk about how one of your better known works is incomplete and only shows half of the picture".
I started reading HTDP long ago but at the time I made the huge mistake of taking a detour to learn Racket, and later Scheme... HTDP provides small languages just to save students from the herculean effort of having to learn a whole programming language and environment. Way to miss the point.
Anyway, there's some associated documentation on the Racket site about a library called "2htdp/universe" [1]. The architecture described is very close to the so called "ELM architecture" [2] or even Redux [3]. Probably (maybe?) there's no relation though, since coming to this architecture appears to be really natural when working with functional programming languages.
Newton's "..standing on the shoulders of giants..." was a dig at Hook (who was short). It's Isaac being an arrogant jerk, which he was in spades. (Even if he revolutionized thinking in a number of areas.) It's not a statement of humbleness..
I don't know, I heard this interpretation before but honestly the phrase makes sense when interpreted the other way around too, so way not just take the first meaning.
The dig at Hook is a fun knowledge nugget anyway, that silly Isaac. :-p
Something along these lines is still present, I think, in HtDP, at least in the Chapter about abstractions (16.1, 16.5, and 16.6).
Basically, when designing a function one can use existing abstractions (or, I would add, design her own abstraction) whose whole signature, output included, matches the purpose and signature of the function to be designed.
So, if the problem at hand is to design a function that takes a natural number and produces a list, that calls for using the built-in `build-list`, whose signature is `N [N -> X] -> [List-of X]`. Or if the output of a function on lists is a Boolean value, one has to keep an eye on `ormap`/`andmap`, etc. So the output is also essential to choose the suitable abstraction.
I'm pretty sure the ads are not put up by the author by WordPress.com. According to their pricing page[1], they "sometimes" display ads on blogs to cover costs.
As a follow-up, I think the humor is that I imagine this blog might have valuable information, but you’re turning off your target audience with low-brow ads.
For the record, the ads are all Wordpress's doing, not mine. I have no control over them, and derive no benefit from them (on the contrary, they put me off too!).