I would also recommend 1st chapter of "Principles Of Computer System Design: An Introduction - Saltzer & Kaashoek" for a more general discussion on complexity in digital system.
Good to know that there are books addressing this as well. I have the impression that the topic of complexity is not discussed enough, neither in the academia nor in the private sector.
We have to delineate between environmental entropy and the entropy within the system. As about adding entropy, every action has effects which can simultaneously both decrease entropy in one variable while increasing in another.
Any system's future state is not entirely predictable. We determine actions and their impact on current entropy while taking decisions.
1) the author thinks there are obvious ways to decrease the entropy of the coin flipping example and expects these to be so obvious to the reader that they don't need enumerating to exemplify the approach he has in mind
or
2) the author is pointing out that in the coin flipping example the entropy is already so close to zero as to render efforts to reduce it absurd, and believes that this is so self evident as to need no explanation
In other words, the entropy of this article is ~ln(2). I suspect some energy could be usefully devoted to reducing it.
In my opinion, the author aims to introduce entropy as a metric to evaluate the stability of processes. Creating room for advocacy against process changes when new additions/changes are likely to increase entropy by introducing new outcomes.
Example - engineering teams can be obsessed with introducing new variables to a sufficiently stable system with the intention to improve stability. But in turn, reduce stability due to inaccurate impressions of stability of the new variables.
This approach generates a bias against change, but in many situations this bias is helpful. This allows engineering teams time to observe process outcome distribution over a longer duration, improving the data backing any process change decisions.
The coin flipping example ideally would have S = ln(2), with the two outcomes being:
* heads, no issues
* tails, no issues
The articles says this:
"The way to lower that maximum is to bring N to a number as close as possible to one. In addition to reducing the number of possible outcomes, we can further reduce entropy in any given process by reducing the probability of every undesired outcome,[...]"
which applies to a reproducible process. In the case of coin flipping, the desired reproducibility refers to having no issues, so N should be as close as possible to 2".
(the macrostate vs microstate distinction could be introduced but it would complicate the argument)
Yeah, sorry, zero if you consider 'heads or tails and nobody dies' to be basically one outcome. Okay, so... how do you reduce the probability of every undesired outcome?
Reducing the probability of every undesired outcome is maybe too optimistic for the first day :-)
You can reduce the probability of every outcome you can initially think of, plus the ones you find out along the way. As time passes you will converge to a situation that is strongly dominated by the desired outcomes.
The central point of the article is complexity enlarges the undesired outcome space, even if introduced the the intention of reducing it. Perhaps this is less clear than desired.
Okay, so... in the coin flipping case, because entropy is already low, I follow the argument that generally attempts to add complexity are likely to make things worse. There's not much room to make things better to begin with.
Not sure that it follow that the same conclusion applies in the cases of other engineering processes you listed, like "configuring a Linux base image for a specific server role" or "setting up a complex cloud environment from scratch" or... "approaching a stranger in a bar"?
Those are already higher entropy processes, with more desirable and more undesirable outcomes, where telling the difference between desirable and undesirable outcomes is much harder to begin with, and so the general advice of 'expend energy to reduce the likelihood of undesirable outcomes' doesn't so immediately suggest that adding what you call 'complexity' is necessarily bad. There are MANY paths to reducing entropy in these sort of situations. Complexity that adds more outcomes can improve the situation in these cases if the new outcomes are both desirable and probable - or at least more probable than the undesirable outcomes they also add.
I just think that if the conclusions from your toy example don't translate obviously to actionable insight into how to improve the toy example, it's unlikely they translate into actionable insight for how to improve real scenarios.
"configuring a Linux base image for a specific server role" or "setting up a complex cloud environment from scratch"
^^^^^^^
I don't know if you have experience working this kind of thing but these processes have a single desired outcome and multiple undesired outcomes. They can break in many different ways and they do break more often than desired.
One simple case related to setting up a Linux instance is when you want to have the "latest and greatest" python libraries without even checking if the stable versions of an LTS distribution would actually suit you. You end up with a huge list of things from pip, a list that sometimes breaks (even with fixed versions) and where security updates do not guarantee retro-compatibility. One day it works, next day it doesn't.
Your process has more entropy than necessary and is less reproducible than desired.
If instead of a server instance you have a docker image that is rebuilt in a CICD pipeline, you have the same problem but blocking an entire team that is expecting CICD to work all the time.
This is just a single point of failure of many others that can coexist: a "corporate" transparent proxy with cache, an unreliable DNS server, a security appliance that interferes with downloads, etc.
As for rebuilding cloud environments you really have to go through it to see how entropy scales with complexity. I could write a couple of pages about it.. and about how people are actually building cloud snowflakes. Here is a short take:
"configuring a Linux base image for a specific server role" doesn't have a single desired outcome. There are a million ways to accomplish it. No two sysadmins would produce the same result.
And in the case where you're re-following the same process but with newer dependencies, there are going to be multiple 'desirable' states you should end up in in the event that the latest dependencies available aren't compatible.
I just don't follow how this insight is meant to make it possible to evaluate how to get out of this mess?
"configuring a Linux base image for a specific server role" doesn't have a single desired outcome. There are a million ways to accomplish it. No two sysadmins would produce the same result.
This is not right. There are a million ways to implement the process. Once the process is implemented there is a single desired outcome which is the correct configuration, and every other outcome is undesired - you can have a procedure to test the servers, confirming if the configuration is correct.
The "million ways to accomplish" define a million different processes with different entropy levels. The ones with lower entropy levels lead you to the desired outcome more often.
"And in the case where you're re-following the same process but with newer dependencies, "
If it brings newer dependencies when you re-follow it is not a good process. That is one of the points to have in mind.
"I just don't follow how this insight is meant to make it possible to evaluate how to get out of this mess?"
Entropy is a way to count the failure modes of a process. For a given business requirement chose the simplest possible that respects it. I gave two examples in the article.
Its interesting joining a new startup seeing how clean organized it is, joining a mature startup and seeing how disorganized it is. There are layers and layers of interpretation of what systems should look like and they are different. We often refer to this as tech debt, but there is always that one box no one wants to touch because of its age and potential importance, kind of like a time capsule of when the company was once clean and orderly. I've see so much entropy.
But to the point of the article entropy only exists when external factors are introduced. Like new talent or tech paradigms changing the landscape of a startup.