And though you didn't ask me, I read it (having been annoyed by your LP post for a few years now) and it strongly resonates with me — the mission, and everything until the start of the “Constraints on a solution” section. IMO the way to give more people more understanding of programs is not to write an entire new programming language / operating system and hope that enough people switch to it, but to work on delivering that “understanding” for existing programs in existing languages. It may be harder, but is more likely to be useful, and you also get feedback on what kind of understanding people want and lack.
Many thanks for the correction, and for the comment! It is absolutely likely that delivering the understanding for existing programs is the way to go. Unfortunately I just don't know enough for that yet. It's a hard chicken-and-egg problem: to understand the current stack I need precisely the concepts and tools that I'm trying to work out.
Over the last few years I've switched back and forth between the two extremes. I spent a couple of years off and on learning how operating systems (OpenBSD, Sortix, a little bit of Linux) work. I've gained some hard-won facility for poking around inside GNU package sources. So it's yet possible that I'll find a way to make progress on an existing stack.
It also depends what the criterion for 'success' is when we consider this "strike out vs work within" tradeoff. If the goal is some level of adoption then it's a no-brainer that working with existing platforms is the way to go. But I may be content just to figure out the right answer for myself.
Working within an existing platform requires a time commitment to a single (or a tiny subset) of the many projects that our platforms have balkanized into. After all that time commitment, effecting change from within requires a level of politics that is definitely not my strong suit. These projects have real users and justifiably shouldn't be giving me the time of day anytime soon.
One alternative I've considered is forking a mature platform. It remains on my radar, but at the moment I think the drop-off in benefits the instant I hit 'fork' is way too great. Consider a platform like OpenBSD that frankly is created by way smarter people than myself and is way more mature. The level of adoption it gets from being POSIX compliant is so miniscule; would making incompatible changes really be that counter-productive? It's worth asking if you're baiting big to catch small.
On some level my real target is to change the customs that influence how open source projects are governed. Even if I managed to overcome all the previous hurdles, it still seems impossible within the existing framework to do things like encourage more forks in projects, or convince more people to cull their dependencies, or read their sources. There's just too much baggage. Starting afresh may paradoxically make a new way easier to see.
This is the "idea maze" as I see it. I'd love to hear more, what you think of it.
Well it's a noble goal and I certainly don't want to discourage you, however you go about it! Perhaps you will learn something new and useful.
But just to make my meaning clear, by “existing programs in existing languages”, I meant doing it in a way that does not require effecting change at all. What understanding is it is possible to deliver for an existing, complicated, messy codebase? For example, you correctly noted that early versions tend to be easier to understand, and code tends to accumulate complexity that makes the global structure less clear. This is true and something I use often: use "blame" to look at where a particular chunk of code was introduced, and look at the corresponding change, along with its description/commit message, which is often simpler. (And I see you've written a tool (http://akkartik.name/post/wart-layers) for those who choose to stay conscious of this and write code in a particular way.) But most software today is available with version history. So, what if this were easier? E.g. imagine if when you view code there's a slider that you can move back and forth to see older or newer versions, while the changes fade out. Or, imagine highlighting the “base” of the code versus the less important changes. Or something; experimentation will reveal what tends to be useful for existing codebases. (And if people find the tools useful, that may even effect change in how code is written, as authors get feedback on what the tool thinks versus their mental understanding, and tweak until there's a match. I've seen Typescript being sold not for some putative benefits of typing on code correctness but simply for enabling IDE autocomplete for instance.)
The broader point is that, to me it seems that your writing and efforts have the implied assumption that understanding of global structure is hard to acquire because everyone is making mistakes, and if everyone is just careful to do things differently, the difficulties will disappear and understandable programs will magically emerge. That is something worth investigating, but I think there's a good chance that perhaps not everyone is making mistakes (as even programs written by the best programmers tends to become hard to understand eventually), and/or that it's not feasible for everyone to be super careful when trying to get things done. (Rather, they're making tradeoffs, and are likely to make similar tradeoffs in future.) Not all the accumulated complexity may be accidental; some is inherent in the fact that the problem in the real world does have messy corner cases (as Spolsky said: https://www.joelonsoftware.com/2000/04/06/things-you-should-...). Similarly most of the causes you identified (backwards compatibility considerations, churn in personnel, vestigial features) (and those identified elsewhere, e.g. in “out of the tar pit”) can be real and unavoidable: there may be features that are needed only for (say) users of old systems but still cannot simply be removed. The best that can be hoped for is to make this fact clearer, not to make them go away.
Finally, there's also the fact that “understanding” is not a property inherent in the system (code, program, whatever) itself, but something that grows in the head of the reader. (Perhaps trying to influence the writer is not the best way...) And different readers come with different questions and goals, and at least as far as the first paragraph of your http://akkartik.name/about goes, may need different sorts of help in different contexts. It's unlikely a fixed organization of the program is going to satisfy everybody.
An example: error handling. We've all seen functions that spend only a few lines doing their “main” job and many more lines checking for errors and dealing with them. This can obscure what the main job of the function is, and make it appear as though error-handling is the main part. (Aside: Knuth observes this causes a psychological barrier against writing too much error handling, while with his literate programming one shunts off the error-handling to a different section/module, and one tends to write better error-handling there.) But consider https://danluu.com/postmortem-lessons/ which says “If you care about building robust systems, the error checking code is the main code!” So depending what kind of understanding a reader is looking for at a certain time, sometimes they may want to understand the “happy” path and sometimes the error handling.
Similarly in general: given a program, sometimes we want to understand roughly how it's organized / what its major components are, sometimes we want to understand the precise boundaries/interfaces between these components, sometimes we want to understand the sequence of operations the program performs and sometimes the frequency, sometimes we want to understand what it does in the “steady state” and sometimes what it does at startup or shutdown or some corner case. And in fact not always the global structure of a program but sometimes only enough to understand how it solves a specific problem — if I want to make a change to (say) Firefox in an afternoon, I may just want to know how it does (say) font fallback.
All that said, yours is an interesting project, and for the reasons you mentioned, and I look forward to what comes out of it. Apologies for a verbose comment; I'll stop here :-)
I really appreciate the detailed comment! I love talking about this stuff.
---
You're absolutely right that code reading is a non-linear activity and that won't ever change.[1] I'm not trying to make all code reading a linear activity; that would be truly quixotic. I want to keep the code organized for the convenience of the writer but still provide an initial orientation for newcomers, when they aren't concerned with error handling or precise interface boundaries.
---
Finding the right version to look at is part of the solution, as you noticed. But there's a second half: making sure the information newcomers need is actually in the repo. Somewhere, anywhere. My sense is that existing codebases don't actually contain all the information needed to truly comprehend them. The context the system runs in, and all the precise issues it guards against. Tests are a huge help here, but I'm constantly making changes to code that I tested in some other terminal or browser window with some complex workflow. Then I often save the code in one window and close the other window. That's a huge loss of information, and it's compounding over and over again in current platforms. All because there are manual tests we can do that aren't easily encoded as automated tests.
It's certainly possible to port my ideas to an existing stack so that more tests can be represented. But how do we recover all the knowledge that has been lost so far?
---
I don't think the problem is that authors make mistakes[2]. They understand the domain far better than an outsider like me. No, the problem is that authors don't capture everything that is in their heads, and that means that I make mistakes when I try to build on their work.
One selfish reason I care about making the global structure of my codebase comprehensible is the very slight hope that others can take it in new directions and add expertise I won't ever gain by myself, in a way that I and others can learn from.
---
"...most of the causes you identified (backwards compatibility considerations, churn in personnel, vestigial features) can be real and unavoidable: there may be features that are needed only for (say) users of old systems but still cannot simply be removed. The best that can be hoped for is to make this fact clearer, not to make them go away."
If the codebase is more comprehensible and captures all compatibility and other concerns as tests, then it becomes easy to fork. Users of old systems could stay on one fork and others could be on a simpler fork that deletes the compatibility tests and the code that implements them. That way they aren't paying a complexity penalty for what they don't use. The tests would make it tractable to exchange code between the two forks, even if they diverge pretty far over time. Or so I hope.
---
Thank you again for your detailed feedback. Now I have a sense of a few more issues to avoid in my writing.
[1] That's partly my complaint with typography in LP: we end up polishing one happy path to death.
[2] If you notice places in my writing that strengthen this impression that I'm trying to reduce mistakes, I'd really appreciate hearing about them. I'm explicitly trying to avoid the failure mode that I think projects like TUNES fall into, of trying to come up with the one perfect architecture.
I also just looked at your profile, and it resonates a lot. Tell me what you think of mine: http://akkartik.name/post/about