Oh this, I've known for years. I learned it early, as a new OS engineer at Convergent Technologies. There was a nasty part of the kernel, where programmable timers lived, that was a large part of the change history of the OS. Looking at it, I saw folks had been thrashing for some time, to get all the issues resolved. But it was a '10 lbs of feathers in a 5 lb box' kind of thing. One change would cause another issue, or regress a bug.
So I took 2 days, made a chart of every path and pattern of events (restarting a timer from a timer callback; having another time interval expire while processing the previous; restarting while processing and another interval expires; restarted timer expires before completing previous expiration and on and on). Then writing exhaustive code to deal with every case. Then running every degenerate case in test code until it survived for hours.
It never had to be addressed again. But it did have to be addressed. So many folks are unwilling to face the music with complexity.
Regarding being 'unwilling to face the music', is it possible that you were only able to perform this refactor due to all the 'thrashing' that came before you? Is it possible that the majority of commits leading up to your refactor were dealing with unforeseen complexities and addressing real bug fixes that you were able to conveniently take in and synthesize all at once?
Refactoring too soon, before understanding the entire system (and all the possible changes/additions/issues that could arise) is probably worse (time-wise) than thrashing for a bit.
As I recall (and it was a long time ago) the history gave some insight into the complexity. But nobody had exhaustively enumerated all the cases. I remember the chart had more conditions that had ever been addressed.
You'd be surprised how few people look through commit history while bug fixing.
There's a steady stream of people who don't understand why 'breaking' the commit history is a problem, and assume you have some sort of untreated OCD if you bother to even care.
100%. Apparently relatively few have experienced the power of `git bisect run`. More than once I've had to explain that no, this isn't just some idiosyncratic aesthetic preference, there's a reason your version control tooling is designed the way it is.
Higher level tooling doesn't prioritize this. AFAIK neither Gitlab nor Github have an actual option in their CI implementations for "test every commit before allowing merge". And both of those are everywhere.
There's also nothing out there which enforces practice - i.e. new code must be covered by some amount of test case (I'll settle for "it is executed in some way as part of build").
These are all things that could be done, but they're not priorities to the big players evidently, and good luck to me trying to get any organization to commit to putting that tooling in or running something custom that does it.
(The problem with this "unit of change" theorizing is obvious, especially considering GH only recently added a squash-merge feature.)
vs. Torvalds on GitHub PRs:
> Git comes with a nice pull-request generation module, but github instead decided to replace it with their own totally inferior version. As a result, I consider github useless for these kinds of things. It's fine for hosting, but the pull requests and the online commit editing, are just pure garbage.
I just set this up yesterday for a new codebase with GitHub actions. Very straightforward, took an hour or two. Only odd part was that the GitHub actions wizard saves the yaml in the root instead of .github/workflows where it needs to be.
Directions vary depending on what you’re doing, but for a React app this will work.
So click on the actions tab in the repo, and click the nodejs template.
Change the run command to the npm script you want to run like npm run ci-check. It will do something along the lines of npm ci && npm run lint-check && npm run format-check && npm run test && npm run build.
Save that. Then move it from the repo root into a new subdirectory you’ll need to create called .github/workflows.
If you are an admin, click the settings tab and protect your merge branch so you can only merge if this succeeds.
Oh, that's a misunderstanding then. I think what was meant by "test every commit before allowing merge" was testing each individual commit in a PR branch, not just testing PRs before merging. I don't think your solution does this. :-(
This setup does indeed test every commit in a pr branch. For the “on” property, it only has to be set to “pull_request” to do so. If you also add “commit”, it runs it twice.
It does exactly what you wanted it to. Feel free to email me if you need help setting it up, happy to help. Cheers!
This is really weird. The nodejs template (https://github.com/actions/starter-workflows/blob/master/ci/...) is set up to test on "pull_request" and "push". There's no mention of "commit" anywhere in the documentation, and when I try to add it there in the GitHub editor, it tells me it's incorrect. Are we talking about the same thing?
But this only works if you push each commit separately. If you push them all at once, it only tests the last one. Also, whenever you rebase, only the last commit is tested.
> Higher level tooling doesn't prioritize this. AFAIK neither Gitlab nor Github have an actual option in their CI implementations for "test every commit before allowing merge". And both of those are everywhere.
Of course not, because that would be silly.
Nobody cares when code was committed, because that’s not when bugs are introduced into working software.
Bugs are introduced when a branch full of commits is merged into working software. How could it be any other way? Your CI should be testing “what would happen if I merged this”, not “did every commit in `dest...source` pass checks too”.
If you want your history to consist only of working commits that are green, then squash when merging. GitHub and GitLab (and bitbucket) all support the ability to enforce this as a policy on a repo.
In the context of git bisect, it's vital. You can't go looking for bugs commit by commit with no requirement that any individual commit is actually buildable or functional.
In my own projects I'm as guilty of this as anyone, because it's nigh impossible to get the tools to actually do this (which has a massive effect on the culture which builds up around them).
Learning git bisect starts being worth it if you have more than let's say 8 commits between the last good version and first bad version, and usually you don't (in small projects at least).
> How anyone could even hypothetically arrive at this result is beyond me.
Most of the time bugs are fixed shortly after they are introduced. So there's few commits between the last known good version and first known bad version. So people can intuitively check some version in between and do bisect that way, exploiting their knowledge to cut some steps. For example if there's 8 commits but 5 of them only change modules that have nothing to do with the bug.
Also if these 8 commits are very small you can treat them as a whole and look at all their changes at once when searching for the bug (and git blame to see the reason for change in particular line).
It's a great tool when I need it, but 99% of the time I don't.
> So people can intuitively check some version in between and do bisect that way, exploiting their knowledge to cut some steps
If you have a hypothesis you hold with enough weight to validate, bisecting in general (let along git bisect) isn't even in the picture yet. The strategy really only enters the picture when you've exhausted any strong hunches and need to debug more systematically.
> and do bisect that way
In context you're not actually describing bisecting.
But taking 'bisect' here as 'bisect': git automates bisecting through the commit history to a greater (with `run`) or lesser (without it) extent. Once you've paid that 15-minute fixed cost of familiarization, `git bisect` is strictly less time- and cognitive-energy-intensive than the manual alternative.
> How anyone could even hypothetically arrive at this result is beyond me
Well, for a handful of commits, it's very easy to do a "manual binary search" for the commit that caused a bug. Not the OP but that's how I read the comment.
If you have a branch which always builds and tests green, you can add a new test, choose a red commit and a green commit and run `git bisect` in a loop to automatically get to the commit that caused the behaviour in that test to regress.
Did just that last week.. But for some reason, each time I share the result of a git bisect some thinks that I'm a wizard and instead of learning how to do it themselves they ask me later..
Thanks but its git bisect the magic tool not me (writing the shell script did take me some times though).
Say someone finds a bug in your software. They're running release 4.1.2. They'd just upgraded from 2.6.4, which is three years old, where what they're doing used to work.
You put together a test case and sure enough, it fails on 4.1.2 but passes on 2.6.4. But when did it get broken, and how? The software is complex and you can't see an obvious problem. And looking at the history, there's 4000 commits between those two tagged releases. It would take forever to test them all.
You can do something smarter, though. You pick a point somewhere in the middle. Maybe it's around version 3.2.0. You test that version, and it passes. So - assuming the bug only got introduced once - the problematic commit lies somewhere between 3.2.0 and 4.1.2. You pick a version between those, and repeat the process. Rather than testing 4000 different versions, you only have to test about log2(4000) = ~12 versions of the code.
The git bisect command is designed to help you with this. If you run the following commands:
$ git bisect start
$ git checkout v3.2.0
$ <run test, which passes>
$ git bisect good
$ git checkout v4.1.2
$ <run test, which fails>
$ git bisect bad
...then git bisect will take over, choose the commit half way between those, check it out for you and prompt you to run your tests and report the result with 'git bisect good' or 'git bisect bad'. Then depending on the result, it will choose the next commit to test, and repeat the process until it can tell you exactly which commit introduced the bug.
But maybe your test takes a while to run. Even with fewer intermediate versions to test, you're going to spend all day running tests, waiting for the results and telling git bisect what to do next.
So you automate your test into a little script. If the test passes, it exits with a result of zero (success). If it fails, it returns non-zero.
And then you run:
$ git bisect run ./my-test-script
and go do something else with your day. When you come back, it will have automatically found the commit that introduced the error. Magic!
But there's a cost. For this to work, you need discipline from day one. You can't have commits in the history that say "WIP, changed some stuff, not finished yet" and others that fix things up later in the branch. You need to make sure each commit is a fully self-contained change that leaves the code in a working, testable state.
If you have a small number of individual commits at which the code won't build cleanly, or which break testing for whatever reason, you can work around them with 'git bisect skip' - and you can implement this in your test script with a special return code. But if this happens too much, the whole approach becomes unmanageable.
Sort of. The only issue with merges in GitHub tends to be that merge or squashed commit itself isn’t tested before being committed. It’s always possible, though in practice this should be quite rare, that the merge or squash commit produces a different result than any of the individual commits did. I suppose you could do something like a pre-push hook that would run tests before sharing your code with everyone else, though I haven’t looked recently at what options are available with GitHub.
Edit: I stand somewhat corrected, seems most CI systems actually test the merged code - https://github.com/actions/checkout/issues/15#issuecomment-5... which I presume includes manually merged scenarios also. That said, they don’t appear to test squashed commits, under the assumption I suppose that any series of sequential changes will always cleanly squash with upstream as long as there are no merge conflicts.
When you use squash-and-merge approaches to GitHub PRs, you have to make sure you automatically delete old branches, as the connection between branch and commit only lives within the GitHub PR at that point (and maybe in a commit comment somewhere, not sure). But it’s not as explicit as merges which include the name of the branch in the merge commit.
> seems most CI systems actually test the merged code
The thrust of your point is right, though. CI tests the merged squash-commit after it's already on master, so you can still get into states where you have a failing build on master.
That risk is mitigated somewhat if you block the merging of out-of-date PR branches but this isn't supported on all the platforms.
My company really pushes finding the cause of bugs since we maintain older versions of our enterprise software and might need to push back patches. Multiple times I've been blamed for bugs just because someone only looked at the first piece of commit history where maybe I moved a big chunk of code into a function or something, when really the bug was introduced 10 revisions back. I'm tempted to take a long vacation to write a code blaming AI.
Commit history has saved my ass more times than I can count. Especially useful on a codebase that has churned through a lot of developers that you've never met.
Yeah, including other people's bugs. Worked a few places where people would just give up on bugs well before I did, and less than half of that was perseverance.
The first time I recall doing this was when a QA person came to me to complain that in the last handful of releases we had done, were cycling back and forth between 2 bugs and they had just caught the same thing starting again in a pre-release build. After that it just kinda went into the toolbox to look at why code got a certain way instead of assuming the dev wasn't paying attention (or was in fact dim).
If you know what someone is trying to do, you can find a solution that solves both problems (without necessarily using either solution).
I hope that at some point someone would have had the same idea.
You always think this is going to be the last fix, so rewriting and enumerating the whole thing seems like a wasted effort, until you do it once too often and it becomes clear you need to rethink your approach.
A great example of eliminating accidental complexity by embracing and clarifying the essential complexity. I have seen many such systems where a single exhaustive state machine could replace a mess of buggy and incomprehensible spaghetti code.
I think a critical skill for a developer is to be able to realize when you are frustrated, distance yourself from the problem and ask what is going on, 5 Why's style.
Developers know about the Boiled Frog problem. But we sometimes don't know when we are the frog. We keep hill-climbing one solution because we are only trying to do one little thing so of course I should be able to do a small to medium thing to accomplish it.
We also send out mixed messages on rewrites, and I'm guilty of it too. But there's gotta be some litmus test where we compare the number of interacting states in the requirements versus the number of interacting states in the code and just call Time of Death.
State machines are a surprisingly powerful tool. They do require you to capture all the possible states of the system; the combinational explosion of "X while Y but not while Z" can get a bit nasty, but that can often force a redesign simply to reduce the state space.
If all your logic is immutable apart from a small area around the state machine the state gets much easier to manage. You write everything as f(state, inputs) => next_state. You can at any point inspect or print the state of the state machine, and record its transitions so you have a clearer idea of how it got there if there was indeed a bug.
At the hardware level there are countable events (interrupt, reprogram the device, change the timer list, callback, call from callback, etc) that can be listed and all combinations dealt with.
I recall part of the solution was, scheduling callbacks. So there wasn't infinite recursion (call a timer expiration callback, have it re-start the timer, have it expire, call THAT callback and so on).
> but that can often force a redesign simply to reduce the state space
I'm convinced that this idea, or something similar, is essentially what it means to write "algorithmic code".
At a certain level, all code is just an algorithm (i.e. logic + control). But when we talk about Algorithms that one my find in a book on such things, inevitably what you have is code that was written to avoid unnecessary states in order to enforce certain guarantees with regard to performance, security, fault-tolerance, or some other desirable property.
To me, there's an "algorithm" vs "business logic" continuum, where algorithms operate on cleanly specified mathematical operands and the difficulty is .. mathematical, while business logic isn't conceptually complicated but can be very wordy.
A lot of the work in translating a business spec into algorithms is what JoeAltmaier did: enumerate all the cases. You end up with boxes labelled "on Thursdays when the user is in the US Virgin Islands and is using the compact UI", with conflicting assumptions from the postit notes taken from the meetings, and someone has to go and check what should happen in that box.
I agree that there is a continuum. I would say it's based on how easy it is to generalize over the state space. Stuff like sorting, cryptography, and consensus algorithms are often quite amenable to clean generalizations (which is often why abstract algebra can be useful in those domains[0]).
There's also another axis which is the strength of the guarantees provided. Most interesting software, whether it is has many or few special cases, has strong guarantees that one is trying to make. Often in order to make stronger guarantees you have to add special case handling, but software which manages to have both strong guarantees and few special cases is the most interesting kind.
But even stuff that is littered with special cases, whether it's query engines, control systems, workflow engines, resource schedulers, or chat applications, I think can benefit from the type of algorithmic thinking which strives to make stronger guarantees with fewer special cases.
Great advice on immutability, separation of concerns/domain logic, and side effects. I think modern languages or tooling should somehow enforce this. I think a lot of code will be better off for it. Specific patterns/zen (like another comment below) is what helps.
I too have to look the Moore/Mealy distinction up every time, but yes.
The real trick is finding all the state machines you've built in your program without noticing. Or are contained in dependencies. Parsers, TCP, SSL. Authentication is a state machine that usually has two states but may have more; places like Amazon have "we know who you are but you've not entered your password this session", for example. Anything involving message delivery. Anything transactional.
Complexity is often frivolously created during specification. If the true consequences of complexity (wherever it may live) were understood, we'd simplify things a lot. But they want that next button, they want ancient emails be searchable instead of having them archived, and they have no idea that their wish leads to new servers installed in a data center, new bugs introduced, and development time lost at for each new release.
There are also ways to design things better (or worse) with regards how they handle complexity. Reuse UI, patterns, workflows and languages. Keep things consistent, make things discoverable.
Point is, the slogan "Complexity has to live somewhere" could also be used as an excuse to do sloppy work. Then again, this keeps us all employed.
> Complexity is often frivolously created during specification. If the true consequences of complexity (wherever it may live) were understood, we'd simplify things a lot. But they want that next button, they want ancient emails be searchable instead of having them archived, and they have no idea that their wish leads to new servers installed in a data center, new bugs introduced, and development time lost at for each new release.
I think features are definitely worth servers being installed in a data center, and most people coming up with product requirements are fine with the monetary costs.
Features leading to bugs and wasted development time... that's where the point of the article lives. There are ways to build software that can more easily accommodate changes.
However, it's more common that developers try to hide the complexity of features behind abstractions that hinder more than they help. And that's what results in breakages and lost time.
I agree with this. At perhaps a slightly higher level, I've been toying around with the idea of "business process debt" to describe a different-but-similar concept. Basically that many type the business/product people keep adding new features, new methods, new nifty little ways to make money. And that's all fine, but they come with a certain amount of inherent complexity - sometimes quite a bit. Eventually that complexity will drown your development process, and it can look like a software problem even though it largely isn't.
Of course your software will also add some accidental complexity as well (perhaps a lot) but ultimately the software can't be any simpler than the business process it's trying to model. And if that process is a spaghetti nightmare, well, you're just stuck with that.
You may not mean it this way, but the your post makes it sound like complexity is created out of whole cloth (so to speak) by adding (in your scenario) searchable old emails.
And true to the article, that is only half of the equation. Complexity is not created as much as it is moved around. By adding code that makes old emails searchable instead of simply archiving them moves the business complexity of customers with this specific edge case to your code base.
It is then a business decision. And I agree with you that it is often not worth it, but the art of business is in understanding when taking on the added complexity is worth it to differentiate your product over your competitors. Or stated another way, strategically deciding to reduce the complexity of your customer's business and increasing the complexity of your own.
This is not always an obvious decision, hence why the developer rank and file often look at it from one side only, and the product manager's/sales staff often look at it from their side.
It takes a strong leader who understands both sides to make the right strategic decision. When this decision is not made thoughtfully you end up with an overburden of complexity for a subset of customers that don't really move the needle on your business.
I find this is a strong reason to build test cases as early as possible in the dev process. I find myself removing 'nice to have, but not strictly necessary' options all the time when I realize they'll double or worse the number of test cases.
I think the conversation has shifted slightly this decade to avoiding incidental (accidental) complexity (I didn’t realize Fred Brooks popularized the term!), which I wish the author would address.
Otherwise this essay is spot on when it comes to essential conplexity.
Incidentally, the question of “where complexity lives” is one of the focal points of “A Philosophy of Software Design,” which comes highly recommended if you’re trying to come up with your strategy for managing complexity from first principles.
I think the most contentious part of the post is that I just simply assert that people are an inherent part of software. You often avoid the incidental complexity in code by indirectly shifting it to the people working the software.
Their mental models and their understanding of everything is not fungible, but is still real and often what lets us shift the complexity outside of the software.
The teachings of disciplines like resilience engineering and models like naturalistic decision making is that this tacit knowledge and expertise can be surfaced, trained, and given the right environment to grow and gain effectiveness. It expresses itself in the active adaptation of organizations.
But as long as you look at the software as a system of its own that is independent from the people who use, write, and maintain it, it looks like the complexity just vanishes if it's not in the code.
I do think it's important to distinguish accidental and essential complexity. Some complexity is inherent and if you think you've eliminated it, all you have really done is made it someone else's problem.
But there is also a lot of complexity that is simply unnecessary and can be eliminated entirely with effort. Humans make mistakes and some of those mistakes end up in code. Software that does something that no one ever intended can be simplified by having that behavior removed.
Complexity is like energy, it cannot be created or destroyed, it is a fundamental property of any process or thing.
However, complexity can be managed. Humans do this using abstraction as a tool. We divide the complex problem up into a sequence of several simpler states, and we find it easier to understand the simpler problems along with the sequence within which they lie.
Good software uses this same approach to reduce complex issues to a manageable process. A good tool makes the simple things easy and the complex thing possible, the design of the tool reflects the effort and work that the designer out in to u the problem they’re trying to solve, and to produce something that helps guide others along that self-same path of understanding, without them having to put in the same level of effort; it establishes the golden path through the marshes and bogs of difficulties that the problem domain throws up.
“Embracing complexity” is a measure of last resort, IMHO. It means the tool developer could not analyze the problem and come up with a good solution; it means “here, you figure it out”; it means giving up on one of the fundamental reasons for the tools existence.
Sometimes, embracing complexity and the ensuing struggle that this necessitates is simply what you have to do, but not often. Maybe, maybe this is one of those times, but I always start off with a critical eye when someone tells me that a complicated thing is “the only way it can be done”. Colour me sceptical.
There is a distinction between necessary complexity and "busy beaver" artificial complexity. And it turns out that there's a pretty compelling mapping between energy and information and fundamental complexity. There's a reason they both use a concept of 'entropy'.
That said, there's no denying that there's probably a lot of unnecessary complexity in most software.
I'm sorry without some citations this is still handwavy bollocks.
Energy does have a conservation law. Entropy does not. Not does information, complexity etc.
I majored in phyiscs, I'm familiar with these terms.
Grand parent post made no indication they were talking about essential complexity when they made their bold assertion about how it's conserved like energy.
I'm also not aware of any formal or precise way of measuring "essential complexity". Not in the way we do for energy or entropy.
If only there were some scientific discipline that studied complexity and proved that problems possess some essential, minimal, complexity below which no implementation can go...
If anyone proved such a result, I'm sure they would be awarded a prize of some kind.
Well, there are many complexities. People here are talking about the Programming Complexity (of which one measure is Cyclomatic Complexity) of a textual program vs. the Time/Space complexity of the same program. There is the weakly-related concept of the Kolmogorov complexity of a string.
It's obvious that the time/space complexity of a program can stay fixed while you arbitrarily raise the programming complexity of a program by arbitrarily raising the number of branches that are rarely taken, an action that won't affect the asymptotic time complexity of a thing.
And sure, you can talk about the Kolmogorov complexity of a string of computer code, but the minimal string representation of that code is unlikely to be one that a programmer would describe as simple. Even minimizing the string that would behave as that of the original program is usually non-desirable.
Pick any of those. Given a problem, there is a minimal complexity to the programs that solve it. You can always raise the complexity, but not reduce it.
I'm confused by your response to ninjapenguin[0]. Were you agreeing with him or disagreeing or something else? I interpreted your response to be disagreement. I feel that your original style of phrasing "If only x etc. etc." is not easy to understand. I'm having a hard time placing your subsequent comments in context of that original response.
I was disagreeing with the statement that "Comparing [complexity] to something as fundamental as energy is pure bollocks." There is, in fact, a discipline that studies complexity -- computer science, and, in particular the field of complexity theory. Complexity was, in fact, found to be fundamental and "irreducible". Two people, Hartmanis and Stearns, did, in fact, discover that through a comparison to physics [1] in 1965, and for that discovery -- that led to the creation of complexity theory -- they won the Turing Award in 1993.
Not really. You can describe the complexity of "code understanding" as the computational complexity required to answer a question about it, say the time it takes to find it or the length of the proof -- both are common computational complexity measures. The proof complexity of determining that bubble-sort sorts is, indeed, shorter than that of Timsort. You could even talk about the simplest sorting algorithm as the one with the shortest proof. There is, indeed, a minimal proof complexity for sorting, just as with any other kind of computational complexity.
No, they're completely different uses of the term.
A lot of real-world CRUD code, for instance, does nothing of computational interest whatsoever, but it isn't simple, because it has to cope with real-world business-logic complexity. (And possibly a lot of complexity beyond the necessary complexity, due to poor design, changing goals slowly messing up the code, etc.)
> You can describe the complexity of "code understanding" as the computational complexity required to answer a question about it, say the time it takes to find it or the length of the proof
I don't follow. How are proofs relevant? A programmer having to wade through poorly-named functions and a lack of documentation, is not well modelled by computational complexity theory.
> You could even talk about the simplest sorting algorithm as the one with the shortest proof.
Who'd care? That bears little relation to either the complexity theoretic properties of the algorithm/problem, or to complexity in the informal sense.
They are one way of measuring how hard or easy it is to know something, and I define simple as easy to answer questions about, and complex as the opposite.
Introduce enterprise into any system and complexity goes over the roof ASAP - suddenly its not only your machine but TEAM of people, history, dashboards with metrics, logs, automation, security etc...
This one isn't even approaching any of it except dev side and CI.
I think the analogy to energy is pretty good. Like the FizzBuzz thing, you can also build a machine that has a lot of energy but doesn't accomplish anything useful.
The conservation of energy, specifically, is a misleading analogy. If you want an energy-related analogy, I think you will find a better one in entropy (especially given its close relationship to information.)
Complexity is like entropy, not energy. It increases, unless you spend a significant amount of energy removing it from a small area, and in the process increase the complexity somewhere else by a larger amount.
However, any degree of functionality requires at least a matching degree of complexity. In that sense, some complexity is 'essential' and that complexity does need to live somewhere in your project.
I see embracing complexity as acknowledging it exists, and attempting to represent the complexity in parts and layers.
There are always cross-cutting concerns that aren't easily transformed, or become that way due to code changes. I think this area is where managing complexity is most difficult.
In a certain sense complexity CAN be destroyed. Yes there is a minimal level of complexity that must exist, but given an entity with a purpose, in many cases it is possible to reduce the complexity of that entity without changing its' purpose.
Take for example this function which adds one to an integer:
function addOne(x: int) -> int {
return (x * 1) + 0 - x + x + 1;
}
can be reduced to:
function addOne(x: int) -> int {
return x + 1;
}
The examples are obvious but from the examples you can deduce that their are many instances in software engineering where such reductions can exist but are not obvious at all.
I phrase it as "The solution can be as simple as the problem but no simpler", which means, you can "create" complexity. We see this all the time in code, after all, we must account for this.
But you can't make the solution simpler than the problem. There's kind of a "by definition" aspect to this, so it's kinda circular, but it's still a valuable observation, because we also see this in code all the time. For instance, it plays a significant role in the leakiness of abstractions; that leakiness is typically the attempt to make a solution that is simpler than the entire problem space. You can be fine as long as your particular problem fits in the space of the leaky solutions, but once your problem pokes out of that you can be in big trouble.
I would rephrase this as “the solution can be as simple as your model of the problem but no simpler.” A better model, structure, implementation, or restatement of the problem could transform the dimensions of the problem space radically.
But I do agree with you in spirit of what you said, that there is some essential “hardness” of the problem that is part-and-parcel with the difficulty in stating or explaining the problem. I see a similar complexity in representing the problem in code or functions or algorithms etc. Then comes a related issue with reducibility; some concepts or problems cannot be broken down into smaller discrete parts. They are already as simple as they can be. This is ideal in a certain sense, in that the concepts can be explored with fewer assumptions. However when someone asks you to explain like they’re 5 years old, interpretation and perception starts to creep in through intuitions and abstractions.
I am certain this is some kind of fallacy, since "house" is an abstraction that would need to be broken down somehow. The result would not be a simple problem anymore. Sorry, no fancy words, just common sense.
It's not that simple. For one we don't have a formal definition of "simple" and secondly, even when we do have a definition of "simple" for certain contexts, we can never prove that something is at its' "simplest" state or not at its' "simplest" state.
For example, newtons laws of motion, seemingly is a simple solution to finding a formal method of describing motion. It seems to fit the problem but there is no way to verify that it is the simplest solution to the problem and we don't even really know what "simple" is.
We know through experimentation that relativity fits observations more accurately but even with relativity we don't know if it's the most "simple" solution or even if its' fully correct.
Because you cannot know or ever know whether a solution is at its' simplest form, there is ALWAYS an opportunity for a reduction to be discovered.
You absolutely cannot say that "The solution can be as simple as the problem but no simpler" because we don't know what "simple" is and we don't know whether or not a "simpler" solution may be discovered in the future.
Well, to use your x + 1 example, we can say that one is simpler than the other, even if we can't define "simple" in exact terms, or even measure it precisely.
(By the way, I just noticed that your first sentence is perfect for this topic. I suspect that it was accidental, but it's beautiful.)
So I think that simplicity is perhaps somewhat like encoding in Shannon's information theory. Even if you can calculate the minimum number of bits required to represent something, you can't necessarily devise an encoding that will represent it in that number of bits. The analogy isn't perfect, but it has the same feel of "pushing toward something that you can't actually reach". (I guess you said the same thing with your newton's laws analogy, so I'm kind of repeating your point here.)
But I disagree with your last paragraph. Even if we can't measure "simple", we can still kind of feel when complexity is there. If the user needs to be able to do 50 things, then you either need a program that lets the user do 50 things, or you need 50 programs (or something in between). If your program is going to let the user do 50 things, then there has to be a way for the user to specify which of those 50 things they want to do, and the code has to be able to do all 50 things. You might be able to come up with some commonalities, a unified framework to simplify things somewhat, but that can only take you so far. You're still going to have to have 50 buttons on the GUI, or 50 command-line parameters that you accept, or 50 different URLs that you interpret, or whatever.
Or, you can have 50 separate programs, and each of them can be much simpler. But then you have 50 programs, and the user has to figure out which one to use for each thing, and remember them all.
So I claim that you can kind of tell that it's there, even if you can't measure it, and that's good enough for the statement to be a useful thing to keep in mind, even if it's not something we can measure or calculate.
Just because it lacks a formal definition does not mean that somebody cannot say it. It's a human notion, not a formal one.
(Also: I'd suggest that it's pretty rude to tell somebody they "absolutely cannot say" something, and try to avoid telling you that you absolutely cannot say that somebody absolutely cannot say something)
Human notions and formal notions overlap. All formal notions are human because they are developed by humans but not all human notions are formal because formal implies internal consistency and not all humans are rational.
The difference between the two sets: Human intuition and Formal notions is the set of all notions that are contradictory or irrational. Humans can be contradictory or irrational but we should try to avoid it.
>(Also: I'd suggest that it's pretty rude to tell somebody they "absolutely cannot say" something, and try to avoid telling you that you absolutely cannot say that somebody absolutely cannot say something)
Maybe a better way to put it is: "Absolutely logically impossible' rather than "you absolutely cannot say." No offense intended.
There are formal definitions of complexity, of which Kolmogorov complexity is perhaps the best known.
This may not be the sort of complexity you are thinking of (perhaps you have in mind 'hard to understand', which is an important issue that is not necessarily captured by formal measures), and formal metrics measure solutions, not problems (or, at best, a specific expression of a problem.) We cannot often decide whether a given solution is the simplest possible, but there are several fallacies that do not follow from this fact, such as:
1) Therefore, there cannot be a simplest solution.
2) Therefore, there is ALWAYS an opportunity for a reduction to be discovered.
>1) Therefore, there cannot be a simplest solution.
I never said this, I said that you cannot verify that the current solution you have is the simplest solution. This is based off of science, nothing can be verified science. Because we treat software more as black box experiments rather than formal systems, verification of your solution as the simplest solution is impossible until you define formal axiomatic rules. But again, this is basically never done and Even than you'd have to deal with Godels incompleteness.
>2) Therefore, there is ALWAYS an opportunity for a reduction to be discovered.
There is always opportunity in the sense of "possibility" if a solution hasn't been formally verified to be its' simplest. 2 follows from the notion that complexity cannot be verified.
"The notion of Kolmogorov complexity can be used to state and prove impossibility results akin to Cantor's diagonal argument, Gödel's incompleteness theorem, and Turing's halting problem. In particular, no program P computing a lower bound for each text's Kolmogorov complexity can return a value essentially larger than P's own length (see section § Chaitin's incompleteness theorem); hence no single program can compute the exact Kolmogorov complexity for infinitely many texts."
The above states that the exact Kolmogorov complexity can never be computed for all texts and that this fact has been formally verified axiomatically.
Doesn't this support my overall point? If you can't verify the complexity of a piece of text than you can't verify if it's the least complex because you don't even know the complexity let alone whether its' the least complex.
It is not entirely clear what your original point was, as it apparently involves being certain that something is possible. Are you sure?
The possibility of there being a simpler solution does not justify your claim "there is ALWAYS an opportunity for a reduction to be discovered" (and you specifically capitalized ALWAYS so we wouldn't misunderstand you!)
It might not be what you meant, but it is what you wrote, and apparently what you are now trying to justify.
Furthermore, you were originally objecting to the statement that "the solution can be as simple as the problem but no simpler", but this is a reasonable statement given that, as we both seem to agree, there is no formal metric that captures the sort of complexity we are talking about here. Not being able to settle the issue formally would not mean that we cannot discuss it usefully, and a concept of complexity in which solutions can be simpler than problems would be a strange, and probably not useful, definition.
As others have pointed out, Fred Brooks wrote about the very useful, but informal, concepts of essential and accidental complexity, to make an important point. In being pedantic about the lack of a formal definition of such terms, you are avoiding (or, at least, unnecessarily complicating) the sort of issues he was discussing (and we are here).
>It is not entirely clear what your original point was. The possibility of there being a simpler solution does not justify your claim "there is ALWAYS an opportunity for a reduction to be discovered" (and you specifically capitalized ALWAYS so we wouldn't misunderstand you!)
Let me clarify. If simplicity of a solution cannot be verified then a probability of the existence of a simpler solution will ALAWYS be greater than zero. This is in line with what I wrote but hopefully more clear.
>Furthermore, you were originally objecting to the statement that "the solution can be as simple as the problem but no simpler", but this is a reasonable statement given that, as we both seem to agree,
No I don't agree. The statement is easily disproven with an example of a problem that is more complex than the solution:
>there is no formal metric that captures the sort of complexity we are talking about here
I disagree, fuzzy human intuition can always be formalized given enough thought. If the intuition is flawed and irrational, formalizations of such intuitions will make this evident and further our overall understanding.
But either way even with this fuzzy definition of complexity my example of Newtons laws of motion, seemingly the simplest solution to the laws of motion was not only not the most general or simple solution but ultimately "less correct" than relativity. It always seems as if it's the simplest solution but you actually never know. This is in line with my experience.
>In being pedantic about the lack of a formal definition of such terms, you are avoiding (or, at least, unnecessarily complicating) the sort of issues he was discussing (and we are here).
I'm not trying to be pedantic. I am just trying to say that there are tons of instances where something looks like it's the simplest solution but it is actually not and you can never really know either.
Formal proof that something cannot ever be verified to be in it's most simplest form is just the ultimate supporting pillar. We can talk in terms of opinions and vague human definitions but this kind of argument leads to conversations that are never ending. A formal proof moves the argument towards a final resolution. I actually wasn't expecting a formal proof to be introduced into this argument, but it was introduced from your wikipedia source.
Before you introduced that source, I always knew that verifying that a solution is the simplest possible solution is an impossible endeavor in terms of science. Your source introduced to me that it is also an impossible endeavor in terms of logic.
Of course the strategy to win an argument that has already been verified by formal proof is to move the argument out of the realm of formal proof which you are doing now. It's a valid strategy but I still disagree, even on those fuzzy informal grounds.
So to keep in line with the overall topic. Can more complexity always be destroyed for a given solution? Can complexity always be reduced for a given statement?
The formal answer is: We can absolutely never know.
The thing is, prior to Gödel, one might have said "a probability of the existence of a proof of the decidability of Peano arithmetic will ALAWYS be greater than zero", and, by your argument here, that would have been correct - but, of course, one would have been mistaken.
It is your persistent and insistent use of the word 'ALAWYS' that is complicating things, and things would have been simpler if you had written of possibilities rather than certainties in the first place. You have written a great deal arguing for something that is not in doubt.
>>Furthermore, you were originally objecting to the statement that "the solution can be as simple as the problem but no simpler", but this is a reasonable statement given that, as we both seem to agree,
>No I don't agree. The statement is easily disproven with an example of a problem that is more complex than the solution:
(By the way, it is rather amusing how you cut off my sentence in the middle in order to give it a meaning that the original does not have!)
Here we have an additional confusion. The solution to a travelling salesman problem is just an ordered list of vertices, but doing what you are doing here - just writing down the solution - completely misses the point.
> So to keep in line with the overall topic. Can more complexity always be destroyed for a given solution? Can complexity always be reduced for a given statement?
> The formal answer is: We can absolutely never know.
It is probably just as well that your "Solution: 0" example is beside the point, for if it were not, I would doubt your assertion that we can absolutely never know whether more complexity could always be destroyed for this given solution.
>The thing is, prior to Gödel, one might have said "a probability of the existence of a proof of the decidability of Peano arithmetic will ALAWYS be greater than zero", and, by your argument here, that would be correct - but, of course, one would have been mistaken.
There is a difference you're not seeing. The wikipedia article introduces the fact that it has been Proven that the complexity of a solution can Never be verified. This is equivalent to saying: "It has absolutely been proven that the probability of the existence of a simpler solution is greater than 0"
This is entirely different from what you're saying. It has never been proven that decidability of Peano arithmetic will ALAWYS be greater than zero. We can only make your statement given lack of knowledge; given more knowledge the statement may be untrue. My statement is saying that there is no amount of knowledge in the universe that will allow us to know whether a solution is in its' simplest state ever.
>It is your persistent and insistent use of the word 'ALAWYS' that is complicating things, and things would have been simpler if you had written of possibilities rather than certainties in the first place. You have written a great deal arguing for something that is not in doubt.
Then I clarified it when you specified that it wasn't clear. The reason why I used the term ALWAYS was to emphasize that it is PROVEN to be greater than zero. You didn't like it and said so and then I clarified what I meant.
>(By the way, it is rather amusing how you cut off my sentence in the middle in order to give it a meaning that the original does not have!)
It's not my intention to do this. If you feel it was done, just point it out and I will address it accordingly. If I detect any further malice in your replies I will end this conversation.
>Here we have an additional confusion. The solution to a travelling salesman problem is just an ordered list of vertices, but doing what you are doing here - just writing down the solution - completely misses the point.
That's a valid point. If we called "0" the "conclusion," than shouldn't the "solution" include the computations to arrive at the "conclusion"?
If so I would say that it the computations in the "solution" are still simpler than the problem overall:
problem = "1*2*3*4*5*6*7*2*38*3432*0*232*22*33453*221*334/2*3929330"
def solve(problem):
return 0 if '0' in problem else eval(problem)
If you look carefully at the computational steps of "solve." One branch should do less calculations then the total amount of numbers in the problem. Therefore even the computational steps are smaller/simpler than the total size of the problem.
This is equivalent to saying that there exists problems of size N where the Big Oh complexity of the solution is of O(M) where M < N, which is certainly true. I know I sort of slaughtered the meaning of Big-Oh here but you get it.
>It is probably just as well that your "Solution: 0" example is beside the point, for if it were not, I would doubt your assertion that we can absolutely never know whether more complexity could always be destroyed for this given solution.
The whole point of Kolmogorov complexity is precisely the concept of a minimal program for solving problems of a certain type. The uncomputability of Kolmogorov complexity is beside the point here, just as Turing's halting theorem does not prove that it is always possible that any given program will fail to halt.
> One branch should do less calculations then the total amount of numbers in the problem.
Judging the complexity of a solution by how it responds to a subset of problems is the same sort of thing as just stating the solution, as that subset is the only set of problems that we are considering. In this case, that set is "products of a set of numbers including at least one zero."
>The uncomputability of Kolmogorov complexity is beside the point here, just as Turing's halting theorem does not prove that it is always possible that any given program will fail to halt.
I never implied this. First off the halting problem only proves that you can never CALCULATE or PROVE whether a given program will fail to halt. It does not prove that it will always fail to halt or never fail to halt. I never said or implied otherwise.
The uncomputability of Kolmogorov means you can never CALCULATE or PROVE the simplest form of a program. It does not mean that such a form doesn't exist nor does it mean that we haven't randomly found such a form for a specific program. It just means that even if we have found something that we think is the the Kolmogorov complexity of a given program we can never verify it to be true.
This fact is sufficient to show the following can never proven be true:
"There exists a program where the Complexity can never be reduced or destroyed"
>Judging the complexity of a solution by how it responds to a subset of problems is the same sort of thing as just stating the solution, as that subset is the only set of problems that we are considering. In this case, that set is "products of a set of numbers including at least one zero."
But Kolmogorov complexity calculates exactly what you don't want to judge. See the definition:
>In algorithmic information theory (a subfield of computer science and mathematics), the Kolmogorov complexity of an object, such as a piece of text, is the length of a shortest computer program (in a predetermined programming language) that produces the object as output.
From the definition you need a predefined input from the set of all inputs and a predefined language from the set of all languages.
In terms of algorithmic complexity theory, I can define a problem to be as narrow or as large as I want. The domain for the previous problem can the set of sets of numbers that contain one or more zeros.
Subsets from larger problems can be used to create other problems and those problems and all problems in general can be used to make statements about complexity all together. If a subset shows something is not True then you have shown that you cannot make that blanket statement about the subset or the parent set.
>I never implied this... I never said or implied otherwise.
This is a very odd way of responding. When someone makes a point, it does not necessarily carry the implication that you said otherwise.
> First off the halting problem only proves that you can never CALCULATE or PROVE whether a given program will fail to halt.
It does not prove that. There are programs that one can prove will halt, and also others that you can prove will not halt.
> This fact is sufficient to show the following can never proven be true: "There exists a program where the Complexity can never be reduced or destroyed"
This seems to be demonstrating the same sort of misunderstanding as over what the halting theorem proves. Even where we cannot compute the Kolmogorov complexity, it does not follow that there is not a minimal one. As I mentioned before, the whole idea of Kolmogorov complexity is predicated on there being programs of minimal complexity.
Going back to that Wikipedia article, we see that "The theorem says that, among algorithms that decode strings from their descriptions (codes), there exists an optimal one."
>This is a very odd way of responding. When someone makes a point, it does not necessarily carry the implication that you said otherwise.
First phrase was in response to your statement. Second phrase was a followup to my sentence that you dotted out (...). And you accused me of modifying your wording with bad intentions.
>It does not prove that. There are programs that one can prove will halt, and also others that you can prove will not halt.
Semantic error, The halting problem only proves that a program can never CALCULATE or PROVE that a given program will fail to halt. That was what I meant.
>This seems to be demonstrating the same sort of misunderstanding as over what the halting theorem proves. Even where we cannot compute the Kolmogorov complexity, it does not follow that there is not a minimal one. As I mentioned before, the whole idea of Kolmogorov complexity is predicated on there being programs of minimal complexity.
>Going back to that Wikipedia article, we see that "The theorem says that, among algorithms that decode strings from their descriptions (codes), there exists an optimal one."
No. Not a demonstration of misunderstanding of halting. It is a demonstration of misunderstanding of Kolmogorov as you only just introduced me to it today.
Either way given any code, you cannot determine whether it is the most optimal one even if there Does exist an optimal one as the theorem states. So the intent still stands, let me modify the statement to get the semantics correct while preserving the intent:
"For any program you can determine whether Complexity for that program can be reduced or destroyed"
The above statement is False.
So in all practicality even though a simplified statement always exist we can never know if we found it. So the possibility is always there and the intent of what I said is still preserved if not stronger , see initial reply of this post:
"The examples are obvious but from the examples you can deduce that their are many instances in software engineering where such reductions can exist but are not obvious at all."
Kolmogorov still offers axiomatic support for my statement. In fact it makes the statement stronger and changes it too:
"In all instances of software engineering reductions can exist, you can never determine otherwise."
I can't say for sure but I believe that the statement above applies not only to Kolmogorov complexity, but algorithmic complexity as well.
I take your point, but the complexity I’m talking about is what remains after you’ve done any readily applicable simplifications.
Any normal task has an inherent level of complexity (“add one” is a very simple task, so can be reduced down as you point out). “Invert a matrix” is somewhat less reducible. “Calculate a quaternion” even less so, etc.
We generally string dozens or hundreds of these simpler tasks into sequences to get what we want, and human understanding being what it is, there’s quite a difference in conceptual complexity between “fly the plane to [here]” and doing all that maths.
There’s probably a relationship to subitising here as well, the brain does seem to be able to process small groups of things, even if it knows those things represent larger concepts, better than larger groups of things. This is just speculation on my part though.
You may have seen it, it's in the same thread of conversation we're having.
Basically there's actually axiomatic proof that you can never know if you applied all possible applicable simplifications. Therefore you can never know whether or not more complexity can be destroyed or reduced.
I wonder if the author has read Out of the Tar Pit[1]¹. See, in particular, section 6.
Essentially, the author is, I think, arguing about what the paper calls "Essential complexity", complexity inherent to the problem one is trying to solve. And with that, I agree.
I think the author should acknowledge accidental complexity (or provide some argument as to why that must live somewhere), and I think a lot of comments here on HN are pointing out the fact that accidental complexity exists, and doesn't have to live somewhere. But my guess is that that's not what the author is saying, and that the author is only arguing about essential complexity.
¹I personally found this paper somewhat mixed. The definitions of complexity are what make it worth reading. Its conclusion of "functional programming languages will fix all the woes" I think is not practical.
Yes, there's definitely a kind of complexity that _doesn't_ stem from the problem domain, and it can often be eliminated.
Stuff like over-abstracting, insane default values, duplicated state that needs to be synchronized (e.g in React components) or just overly-repetitive code, for example.
> Yes, there's definitely a kind of complexity that _doesn't_ stem from the problem domain, and it can often be eliminated.
Yes, it can be eliminated, but only to some extent. Programming languages add a baseline of complexity by themselves that _can't_ be removed: programming languages are not infinitely flexible so there will always be problem characteristics (invariants, data structures, etc) that will not be easily expressible. Those are new sources of complexity.
Think how Java simplifies manual memory management when compared with C or C++. Or how the Rust borrow-checker provably prevents a whole class of bugs. But these are all tradeoffs, both Java and Rust are ill-suited to express other problems.
Manual memory management is accidental complexity unless very strict memory usage rules are part of the specification.
Some kinds of type systems similarly lead to accidental complexity, for example in the form of repetitive code because the type system makes sufficient abstraction impossible. It's possible for some kinds of requirements to mandate very concrete types with no room for abstraction in some parts of a program, but even then the compiler can do most of the work with making sufficiently abstract code concrete and performant.
I notice that there's a kind of complexity that seems to straddle the bounds, related to the way it easily turns into effort-reallocation politics. Would it be useful to consider “coordinative complexity” a separate type that's intermediary on the essential–accidental axis but also has other properties?
> Would it be useful to consider “coordinative complexity” a separate type that's intermediary on the essential–accidental axis but also has other properties?
The complexity of getting different pieces to talk to each other? I can see how that's only conditionally "essential" in that, if you have enough control over the environment, you can make the various distributed components as similar as possible, never deal with version skew, and never deal with varying interpretations of the same protocol standard.
OTOH, if you're writing a web page, and you have to make that page useful across multiple browser types and versions, some way of dealing with the vagaries of different browser implementations becomes an essential complexity: You either push it into a library or framework or you handle it in the main codebase, but something has to know what MSIE can and cannot handle.
Pretty much yes, though I was angling toward coordination difficulties between people creating complexity which is “accidental” from the perspective of the whole system but “essential” to individual actors given their incentives and what they can rely on, either for purely social-maneuvering reasons or for other reasons as well. And more specifically, how this happens even when the accidental complexity isn't really in line with the group's values or with the values of the individual actors. The way that this grounds out in a technical sense is scenarios like the one you describe, so I think we're talking about the same broad set of phenomena.
Further, there's different levels and causes of “coordination didn't happen”, such as “we tried, but got confused and didn't have time to understand each other properly” or “actively sabotaging everyone else's ability to deal with the aggregate system so that I can get ahead”… and somewhere in there is one that stands out to me as specifically interacting with technical complexity, along the lines of “I want my piece to not have to be the one that changes and/or takes on other costs”. Which is part of the social dynamic of moving externally-essential complexity around within the aggregate system and potentially creating ripples of accidental complexity along the way.
(Whether and how this also happens when the information systems in question are human minds rather than software components might be relevant but also an exercise for the reader…)
“The trap is insidious in software architecture. When we adopt something like microservices, we try to make it so that each service is individually simple. But unless this simplicity is so constraining that your actual application inherits it and is forced into simplicity, it still has to go somewhere. If it's not in the individual microservices, then where is it?”
The failing of this point is that much of what we call complexity is disorganization. Certainly, there is a fundamental level of logic in any desired system that can not be willed away with cute patterns, but to consider all complexity of an existing system to be necessary is a fallacy. Dividing systems into problem domains does not inherently reduce the total complexity of a system. It probably usually adds to it. But organizing systems this way can drastically reduce the scope of complexity into manageable pieces so that mere mortals can work on it with out having to hold the entire system in their mind at one time.
It’s like saying that you can’t make a garage full of junk any less complicated, because no matter how you arrange it, it will still contain all the same junk. In fact, organizing all the junk into manageable storage can make it much easier to understand, work with, sort through, clean, and identify items that may be unnecessary.
Indeed, but we need to differentiate between the inherent complexity of the problem (which can't be mitigated, but only shifted around as the article points out), and the incidental complexity added when we create the solution.
The inherent complexity can be taken on by the solution (and the inherent complexity may be hidden behind a simple interface, or it may be introduced in the interface, but that's another problem), or it gets removed of the problem domain, and then needs to be dealt with by the user.
There's no correct answer; half the problem is defining the appropriate problem domain, and even then there are only better or worse solutions. The incidental complexity of the system comes purely down to the implementation, which often comes down to how well the problem domain is defined in the first place.
Yes. That’s exactly what I’m saying. Although, from the way I read the article, it doesn’t seem to acknowledge the possibility for significant overhead in a chosen implementation. Sounds like they’re saying, no matter what you choose, it’s all the same. Don’t even try.
While I agree with you, I think of it more like moving the complexity more than removing it. The simplest way to store junk is to throw it all in one space. This is categorically the simplest thing to do. However, your complexity is then moved to the point of retrieval.
If you want to reduce the complexity of retrieving items out of the garage, you can move some of that complexity to when you store things in the garage by organizing them in a certain way. Without a doubt this is more complex when storing, but we would all agree that it is a creates a good balance in complexity between tasks.
However, I would argue that if you ever only store 10 items in a garage, then spending time organizing them in a way that reduces retrieval time would be an utter waste of time, and hence (to completely jump out of my/our analogy) why a good business person takes all of this into consideration when making decisions on how to structure their code and business and what complexity to take on.
Along the same lines, there's a great quote from many years ago that I unfortunately can't find the exact text of, but it goes like this (paraphrasing):
"Most Microsoft Word users only use 5% of its features."
"So why don't we get rid of the other 95%, since it's so bloated and complex?"
As every discussion about bullshit jobs or frontend frameworks, or hardware abstraction VMs (recently, WASM) will easily show, we are swimming in accidental complexity. And even when it's not obvious, every general advance on science or technology is fundamentally the removal of some complexity that everybody just adapted to like if it was essential.
So, no, it doesn't have to live somewhere. It can be created and destroyed, this happens every day. Probably some of it can not be destroyed, but nobody knows what part so any article about it will be useless.
I don't disagree with the message, but I'm kinda ambivalent about putting the focus on it like that.
> We try to get rid of the complexity, control it, and seek simplicity.
Well, not really enough. Complexity is a beast that takes many years to understand, and that's when you are really trying. Many devs don't. And companies even less. So while it's true some complexity is unavoidable, I think we still have a long way to go in being aware of it first. We write software once, and that's not a proper strategy to manage complexity. Anyone who has looked into computers from top to bottom knows we have tons of problems with complexity that we really haven't solved properly, and that bleed into day to day development in the ugliest ways.
My particular take when I don't have the massive amount of time required to properly deal with complexity is write something like this: "hey, this is very complex. please don't touch it. if you have to, we have this much headroom. if you need to go beyond that, please rewrite this entirely / find a better way. if the code doesn't bite you immediately I will".
One of my highest upvoted comments with 161 upvotes.
But I've come to another idea too. Part of all this complexity is dealing with change. We could simplify things if some aspects of our software hit a final point where only security updates were published after that. Imagine a programming language that was specified and actually finished without the constant roll of changing patterns and practices. Or a web framework. Or a database. Intentionally designed to be robust and secure from the first day then intentionally set to minimal patches for bugs and security fixes.
Part of the issue though is that things keep changing. New characters are added to unicode, new timezones emerge or existing ones change. It's a hard problem to crack.
An E-program is written to perform some real-world activity; how it should behave is strongly linked to the environment in which it runs, and such a program needs to adapt to varying requirements and circumstances in that environment
Furthermore, as a part of its own environment, the software's own existence has an impact on what people expect and want to do with it.
> Imagine a programming language that was specified and actually finished
Many programming languages like that exist: ANSI C, Pascal, PHP 1.0, Ruby 1.0, Python 1.0, etc. No one uses them. They could simply choose not to upgrade, but they don't.
If you assume people, on aggregate, are reasonable, then it seems that the value of changing languages outweighs the cost of the churn.
If you designed a mainframe in the 70's, it didn't change. And so we are still running mainframes from the 70's today for critical infrastructure, government, financial, and educational work.
This is especially evident for those who used golang for non-trivial projects. Because the language has very poor support for abstractions and higher level constructs, you end up with much more verbose, brittle code that's harder to modify and read. IMO, the language designers, by optimizing for a language that's relatively easy to pick up* have pushed the complexity onto the programmer.
* Any language has its paradigms that need to be learned, and golang is no different. Just because you can learn the syntax in a couple of sittings does not mean you know how to use the language in an effective matter. Something I see many people not bring up.
Another common example I've seen cited is the need for Go code generation tools in the community (lack of generics pushing the complexity to external tools).
That is not my experience from using Go for nearly 10 years at all. Go actually has significantly removed the complexity of writing asynchronous and concurrent programs and has pushed it to its own runtime.
At this moment in time, as a language nerd, there are few languages that I know of that compete with Go in the balance it strikes between performance and complexity.
"goroutines" is probably the only thing that golang has going for it. But today with async/await in languages like Rust and C#, C++ getting a coroutine implementation, and Java getting fibers, golang is no longer special in anyway in this domain, while it still carries the baggage of a non-expressive verbose language, and performing worse than the aforementioned.
Go is special in that it is significantly simpler than all of these languages, while having a large overlapping domain of operation with each.
I'm yet to see a single real-life money-making program written in any of the languages you cited that is as easily readable as a Go program.
This has been something living in my head for a long time, but the largest problem is that the complexity needs to live in different places for different sizes of problem. For example, for simple systems, the build tool that forces a limit to complexity might be the right one, because the exceptions will be small and well-understood. For a large system, you may have a team that is dedicated to the build tool itself, and as such the compromises are different.
If you are lucky, you have someone who is able to analyze the entire system, can identify all of the stakeholders, and can drive consensus when the change in abstractions is necessary, and has the budget to do it.
The danger is that people pick the solution for a large solution first because "we will need this someday" rather than waiting for the accidental complexity to build, knowing that the rework necessary to move to an intermediate system is less expensive than the cost of delay in getting the simple solution out first.
My dream is that we can build incremental complexity systems that could support simple solutions quickly and highly complex solutions eventually. The problem is that these are hard to build. :)
The author seems to be starting with the premise that complexity is necessary.
I'm not sure I'd agree. Some complexity exists because the effort to simplify was too great. Either cost or skill prevented the refactor. As the saying goes, "sorry this letter is so long, I didn't have time to make it shorter."
I've repeatedly found that the first iteration of a solution tends to be more complex. One culprit is that sufficient abstractions were either not recognized or not implemented. Put those abstractions in, and you simplify the system.
So that refactoring step after an initial solution is created is crucial and unfortunately often just not done.
It depends on the kind of work. Systems integrations are often complex for uncontrollable reasons. That one legacy system only speaks ASN.1, not JSON; the three vendor systems have different semantics for a fundamental concept (vendor, order, shipment); failure modes are under-documented, so defensive programming is needed; etc.
If we accept that the integration itself is necessary, then the complexity necessarily comes along with it. Not to mention all the complexity generated by the client base: I'm paying for this system, it's mine, so make it do X, even if you think X is too complicated/impossible/etc.
I know. Lets move it all that annoying complexity to yaml/json/XML config files!
Now look at this app I can write in 30 lines of code with this framework! Oh and 500 lines of yaml across 2 dozen files but look the "code" is so sleek and sexy.
Joking aside I think he could have at least mentioned the difference between required complexity and unneeded complexity. Ironically it seems to spawn from attempts to reduce the first type of required complexity in a lot of cases.
The title is catchy and it's not exactly wrong. There's an element of truth to it. And yet, 99% of the times I hear it, it's as an excuse for crap, not some fundamental truth.
Dirt has to live somewhere, too. That's a good motto for a garbage company, but if you hear a chef saying it in the kitchen while preparing your meal, you might be worried.
Yep. Most of the "complexity" that is justified by mantras like this is not necessary complexity (like the inherent baseline complexity in solving a certain problem space) but cruft/enterprise/poor coding style that winds up suffocating a project. That kind of complexity should be torn out with disregard for the notion that it's somehow necessary.
I would differentiate between intrinsic complexity of a problem and artifical complexity of a solution.
Certain knots can be solved without changing the topology! They would just add artifical complexity.
Also, some truths can have both very simple and very involved proofs. They are a perfect example how just the right approach can reduce much complexity.
Just formulating the problem in a different way can already reduce much artifical complexity.
It is not about simplicity versus complexity. It is about creating simple abstractions that successfully hide the contained complexity.
When I want to search for a string within some data, I don't want to have to think about the algorithm it uses (naive, Boyer–Moore [1], etc.). I just want to know that somebody cared about it and that it will work great.
Bad abstractions, on the other hand, make you think about the underlying details that you have to be aware of. For one layer that might not be too bad, but the more you build on top of such things, the more fragile the whole construct becomes.
This is missing their point. If you use a simple abstraction, you won't be able to address all of the complexity of the underlying task. You'll be sacrificing at some point, either by leaving performance on the table or by ignoring an edge case and having it come back to bite you later. "it will work great" can only go so far without cooperation from the user, and then the complexity starts creeping back in.
I just read the text a 2nd time and I disagree with you.
The point of the text is, that you have to define where to put the complexity:
If you are lucky, it lives in well-defined places.
Otherwise, it will be everywhere (obviously bad):
With nowhere to go, it has to roam everywhere in your system, both in your code and in people's heads.
My point, on the other hand, is that you should focus on good abstractions, which is just one more step into the direction of defining 'where the complexity should live'.
When defining the abstraction you focus on the practical use-cases so that it is as simple as possible but not simpler. The main task here is to find the right focus. In doubt, it might be better to use multiple different abstractions (e.g. if there are two distinct use-cases).
The goal is to find a construct where everything looks simple from the outside because the complexity is hidden by the abstraction.
I agree that abstraction is a useful tool in tackling complexity, but in my experience it is often the only tool people reach for, which can lead to a continuous rat chase of refactoring after refactoring, because somehow the ugly can never quite be fully hidden.
This is exactly what the text talks about: shuffling of the complexity around, trying to hide it, and the end result can often be that you just don't know where it ended up. Personally I think a lot of this is due to the way software engineering ideas like "don't repeat yourself" or "design patterns" can attain an almost cult-like following, and people just blindly apply transformations instead of deeply looking at what the actual problem is.
As others mentioned in the rest of this thread, complexity must be addressed, and only from a perspective of understanding what your problem is, where the traps are, what all of its complexities are, can you then come up with abstractions that will be any good.
It could be that your complexity is necessary, like a small widget that just has an enormous amount of variety that it has to constantly account for. Or it could be that you designed one giant widget made of 10 widgets, and a re-design could simplify how you think about and build it.
Simplicity is the art of taking complex things and stripping them down to their essential nature. Simple things can be complex, but no more complex than they have to be.
With that said, this piece talks a lot about frameworks and tools and patterns, and I really hope people don't think that's the right way to think about architecture. Abstractions do not make things simpler, they make them abstract.
While I agree with the message about essential complexity, I disagree that all complexity is essential. In fact, I think it let's us off the hook far too easily to just claim all complexity is essential. It's easy to invent complexity, in fact it's the default. I'd go as far as to say that about 50% of the complexity in software is non-essential. And in terms of what to address, it's a bit like paying off debt - just like you should always pay your credit card first, always start by eliminating the non-essential complexity.
Paraphrasing the "Fundamental theorem of software engineering", Every problem can be solved with enough levels of indirection. Every level of indirection adds complexity, so the question should be: is the additional level worth it?
I am fully on board with the overall sentiment of the article that there is some irreducible complexity that one cannot avoid. Often it comes straight from the business domain or entity the code is modelling, and then, sure, you cannot make it any easier, otherwise you are not solving the problem and users will be unhappy.
However, then the author goes too far:
> Accidental complexity is just essential complexity that shows its age.
You can absolutely add heaps of unnecessary complexity; in fact, this is almost surely what you will get by attempting to cover every possible future use case and evolution scenario, or to cater to every minor aesthetic concern (e.g., "I need to be able to replace my cloud provider / DB / message queue / user notification medium / etc by changing just one line!").
It takes humility to admit that getting the balance "right" from the start is difficult, and often the only way to improve is to accept some badness now and revisit the decision later with more information.
The quote however seems to be saying that we (as software engineers) always get it right, just later things change and our choice does not appear right anymore. This mindset is counterproductive, as admitting flaws is the first step to improve.
Certainly you can make things more complex than they need to be (for your given use case). That is self-evident for most engineers.
Doesn't that hold that the inverse must be true? In some cases, you must be able to remove complexity from the equation, without introducing it somewhere else.
I agree with the spirit of the post, but I think it's eliding some of the complexity involved in identifying essential complexity.
I find there are some concepts used to qualify complexity and data-model fit in information theory and machine learning that can also be used to think about product-market fit or software-domain fit.
I tried to describe them here. I'm not sure I did a good job:
Low resolution fit: The product vaguely and simply fits its market or domain.
High resolution fit: The product is complex and is tightly tailored to the market or domain.
Overfit: The product is so tightly tailored to some users that it only fits a small part of the market.
Underfit: The product is so generic that it’s missing important features and corner cases.
And then another qualifier 'crossfit' which is a bit harder to grasp, inspired by cross-entropy, that has to do with whether the design language, which can be seen in some ways as 'encoding' the problem, fits the domain or market well.
Kolmogorov complexity is uncomputable. What that means in this case is that we may never be 100% sure if we have eliminated all unnecessary complexity in our programs; if we have expressed things in the simplest way. The complexity may have to live somewhere or... or there may be a simple underlying principle we have just not seen yet.
There is still a huge difference in the amount of complexity in a solution and the amount of complexity in the computer system implementing that solution. In extremely simple cases, such as implementing a mathematical formula, the solution (in the form of words or an equation) and the implementation (actual code) could be on the same size order - there is virtually no accidental complexity. For business solutions, like a shopping web site, the difference could be a factor of 10,000 or more. But that's almost all accidental complexity, because of how organically the web and most programming languages have grown to handle cultural concepts like language, visual flow, authentication, encryption, cross-platform compatibility and so on.
I think that things only become less complex when it becomes old and reliable. Even if the internals of the thing are a fucking mess. For example, Linux, in its entirety is a complex fucking beast for its maintainers.
Yet for other people it's a simple as fuck thing that accomplishes things for them and lets them move forward with getting shit done.
So really, it's how you look at it.
The human body is so freaking complex, it is entirely possible we'll never even understand how half of it works. Yet from another point of view we easily hire a bunch of humans to do tasks (programming, making food, yard work, etc..). And even though we don't know how the body works, we have a simple understanding of how that body can accomplish those tasks, which is enough to move forward.
I think “complexity has to live somewhere” is just the definition of essential complexity. “Accidental complexity is just essential complexity showing its age” makes no sense to me: accidental complexity is often the result of ignoring essential complexity (or using inadequate tools). One often finds that edge cases can be naturally handled by principled approaches that capture the essential complexity of a system (see: Paxos). The alternative is a naive, simplistic approach that ignores essential complexity and quickly accumulates accidental complexity in the form of workarounds for newly-discovered edge cases. In other words, most software systems.
I think I understand the point the author is trying to make, but I think they end up accidentally making too strong of a claim.
Sometimes there is /essential complexity/, but it is far from always. I think the line of thinking espoused by the author leads to less simplification and encourages a "well, the complexity has to live somewhere, so why bother?" type of approach. However, reducing complexity is often possible, and it usually comes before the implementation stage. Reducing complexity sometimes means solving a different problem.
Of course it does. It lives in your frameworks and libraries. It lives in your container classes and threading routines. It lives in that bit of code at the core of your business rules that people are scared of.
If you write anything, even "hello, world", complexity lives somewhere! Just the complexity of the print function path all the way to the hardware would baffle some.
Very interesting. I'm happy I read it. I think that seeking simplicity in code for me is often trying to find the most simple and straightforward way to write the code so that I can understand it. This usually boils down to good naming, and good encapsulation of tackling specific problems into specific (ideally well named) places.
It is nice that we recognize that there is certain amount of necessary complexity and the skill involved is complexity management. But it is a pity that we don't recognize that 1. we can create extra complexity that often wraps around necessary complexity; 2. There is such thing that is called scope.
The strategy recommended at the end with the “embrace it, give it the place it deserves” feels like the “eating your shadow” of design. It's also what I've been trying to approximate most of the time, and I appreciate the representation of it here.
This is one of the best titles I've seen this year. Reminds me of G.G.Márquez, who told how much he sweated over the first sentence. I haven't read the post yet, but I feel the discussion has already started.
Some complexity cannot be avoided, but that doesn't mean that complexity is a constant and removing it in one place necessarily means that it has to go somewhere else.
The real question of design, especially the design and organization of computer programs, is where that complexity lives, and how do you organize that complexity so that it doesn't introduce extra complexity.
Usually the way we handle this is through layers, with the initial layers being simple, the middle layers being complex and the top layers being simple again. There's no theory about why this is better, it is just usually how it's done and it seems to work when we can pull it off.
The bottom layers are your primitives and axioms. Simple building blocks. This is where designers want to create a most minimal and simple set of tools that can help facilitate unlimited complexity at the upper layers (Think a minimal set of language primitives that allow for Turing completeness)
The middle layers are your theorems. Different permutations and compositions of your axioms to implement things like business logic or whatever logic you want. This is where good designers should try to stuff as much complexity into as possible.
The Top and final layer is the interface. Something that is built to directly access the middle layer while providing an interface that is simple and more understandable from the perspective of a user. This is probably the least theoretical aspect of the stack as you have to factor in things like art and human psychology into how you build an interface.
A good example of this will be your phone. The primitives are assembly language instructions, RISC. The middle layer is all the code that facilitates programming and applications and the interface layer is the touch screen.
Note that you don't have to look at it from a birds eye view from CPU instructions all the way to touch screen. You can zoom in and look at just a web stack: Java on the backend all the way up to react on the front end.
By zooming in you will see systems that violate the design principle I describe. Java as a primitive has become incredibly bloated and so has React and the javascript tool chain. There are several reasons for why this happens: Lack of planning, optimization requirements inevitably violate design principles, lack of knowledge, lack of central direction and more.
Either way, always remember that the ideal design that most people should strive for is complexity sandwiched by simplicity at the primitive layer and the interface layer. This goes for most things within computer programming and outside of it, like your car.
> The fact that complexity has to exist is obvious.
I can't tell you how many engineers and designers I've worked with for whom it is not obvious.
Unfortunately, the lesson that complexity has to exist seemingly needs to be re-taught constantly.
A lot of people really truly believe that a beautiful, minimalist, elegant product is always the best solution -- and that the users needing it to do other/extra things are the ones who are wrong.
I'm not really talking about that. I'm saying that it's obvious in the sense that this English sentence has a certain level of complexity... In fact for anything to exist it must have some level of complexity... that much is obvious.
The real question is what to do with complexity. Like you said some people think that complexity should be pushed onto users other people like you and I disagree.
If you're building real systems, it is not the case that complexity of the problem is obvious. I've just spent several years in an organization where inherent complexity was discounted by important technical decision makers in the organization.
The result is a mess since the system was not designed to cope with the operational complexity involved.
One of the reasons to study category theory is that it holds out promise of a way, mathematically rigorous, to discover and describe the essential complexity of a task or problem.
So I took 2 days, made a chart of every path and pattern of events (restarting a timer from a timer callback; having another time interval expire while processing the previous; restarting while processing and another interval expires; restarted timer expires before completing previous expiration and on and on). Then writing exhaustive code to deal with every case. Then running every degenerate case in test code until it survived for hours.
It never had to be addressed again. But it did have to be addressed. So many folks are unwilling to face the music with complexity.