Hacker News new | past | comments | ask | show | jobs | submit login
What the Four Color Theorem Can Teach Us About Writing Software (alexkudlick.com)
201 points by magoghm on Jan 11, 2018 | hide | past | favorite | 54 comments



Interesting article, and good analogy. I've recently gone to making the comparison to physical systems. A physical interconnect can only connect so many things (think your steering column in the car) due to the laws of physics (there's only so much physical space). Software allows us to act without this constraint, each object is a point and can be connected to and connect to an arbitrary number of other objects directly. This potential complexity must be actively fought against in order to keep the system maintainable and comprehensible.

I actually thought this was going to go in another direction. When the proof for the four color theorem was published there was a good deal of controversy about it. It had not been fully verified by a person, the bulk of it was generated by computers. The method of generating the elements of the proof was vetted, but not the final output. This has some relation to the issues we have now with machine learning and systems being built based on the generated models that we don't fully understand. It also relates to more complex chain of compilation systems we use these days where each layer of translation introduces the potential for subtle errors or malicious action.


Well, the four colors might have been controversial but five colors is something that can be proven by hand. Still, four or five doesn't matter for the points in the article. Keeping fewer connections between objects/modules is something we should all strive for.


I think the 4-color teaches us that sometimes software has to be shit. A lot of people held out for an elegant proof but what they got was auto-generated spaghetti. I cannot think of a better demonstration of the fact that sometimes you just have to write a lot of boring code and solve all the cases than the fact that an important theorem in mathematics was solved this way.


It's exactly that kind of code that most benefits from discipline about interfaces. Writing a few thousand lines of code to do something hairy is quite feasible and indeed fun -- as long the result exports a simple, non-leaky interface.

But if the interface leaks: e.g. if it alters some state, if some edge cases aren't covered or some of its internal dependencies have to be visible on the outside then the mess ramifies through you whole system and is a burden on any code you write, change or read.

Mathematical code often fits this model well. Implementations are frequently very hairy because of the calculations involved plus mathematically inelegant bells and whistles required by the real world. But since these things tend to be calculations, their interfaces are often no more than "call this function, get or result or an error, with no side-effcts". Very clean.


Nice analogy, I wouldn't have thought of that. But I'm unconvinced that this is a good measure of software complexity; for one, there are only three nontrivial complexity levels (2, 3, 4).

But it's also unstable: a circle with an odd number of nodes (e.g. a triangle) requires three colours, while a circle with an even number of nodes needs only two, even though both models seem similarly complex.


This analogy has some strength, but one of the most obvious complaints is as follows. Having everything go through one central bus (e.g. `bus.setItem` and `bus.getItem`) requires only two colors, one for the bus and one for everything else. But of course, that just hides all of the complexity.

Nonetheless, focusing on reducing complexity and simplifying dependency graphs is good, mmkay.


You can even get it down to one color by writing a big ball of mud (https://en.m.wikipedia.org/wiki/Big_ball_of_mud)


Its important to remember that complexity matters at all levels of abstraction and you don't improve things by merely moving it from the level you are looking to one you are ignoring.

Simplifying your depgraph by writing on big ball of mud means pushing the complexity down. Instead of a complex module graph, you have a complex dependency graph between variables/functions/whatever in your mudball. Oversimplify your individual modules and you complexify your module graph, or even your production infrastructure (one-line microservices anyone?).

Forced to choose, I would prefer a mudball to dependency hell. But really I would like the complexity to be intelligently and somewhat evenly distributed to each level.


The complexity inside an interface should also be considered -- how the components of the 'bus' interact with each other with one connection to the outside world.

If the 'bus' is simple enough to need few colors, then great.

Same goes for the sibling comment regarding a ball of mud. It makes the analysis more complex but you should consider every global variable as connected to every other global in constructing the graph. Balls of mud, in my thinking, then need nearly as many colors as they have globals since every component theoretically interacts with every other. And when I mixed all the water colors as a kid, I was always disappointed to get a muddy color.


(2, 3, 4) only applies to planar graphs, which the graph of a code base need not be. E.g. if you have n classes which all pairwise talk to each other, you'll need n colors.


Yes. He has a point with the circle though.


> while a circle with an even number of nodes needs only two, even though both models seem similarly complex.

This is touched on in the article. The triangle is a complete graph, each vertice is connected to each other. A 2 colored circular graph would not be a complete graph. It would have a ring or fan-out shape.

You could have graph with 3 vertices that is 2 colored, you just can't have each vertice be connected to each other one.

I don't have great intuition for graphs but I'm pretty sure something like a square or larger cannot be both complete and planar.

Or to bring it back to software, I think it's true that a small system where each piece can directly talk to each other piece is more complex than a large system where pieces only communicate with their neighbors.


I was talking about a circle in which only neighbouring nodes are connected:

      +---+
     /     \
    +       +
     \     /
      +---+
For 6, for example. This thing is two-colourable, while one with 5 nodes is only three-colourable. That's the instability I was referring to.


> I don't have great intuition for graphs but I'm pretty sure something like a square or larger cannot be both complete and planar.

A square can, (you put one node in the middle), but 5 and up can't be both complete and planar.


> A square can, (you put one node in the middle)

Clever!


> here are only three nontrivial complexity levels (2, 3, 4)

The four colour limit is only for planar graphs. Dependency graphs need not be planar.

Also there is a very well understood connection between software complexity and graph colourings: register allocation. If you model variables as vertices, and place edges between variables that are required at the same time, then assigning locations to variables is a graph colouring.

I think this might generalise: the number of graph colours tells you something about the maximum number of things you need to keep track of simultaneously in order to deal with one locality in the graph.


> there are only three nontrivial complexity levels (2, 3, 4).

That's only true if your dependency graph is planar. That said, the sentiment is spot on: you can get away with arbitrarily complicated code as long as you avoid embedding K_3,3 and K_5 in your dependency graph.


>But I'm unconvinced that this is a good measure of software complexity; for one, there are only three nontrivial complexity levels (2, 3, 4)

But can we even reason beyond 4 levels? This point made a lot of sense to me. My intuition is that this comes down to our fundamental inability to generally solve quintics. There are also topological proofs to this which fit the author's point perfectly [0]. 4th degree is the limit of formal proof.

[0] https://en.wikipedia.org/wiki/Topological_Galois_theory


I understand and appreciate the analogy the author made. I wish more people thought about code complexity from first principles.

That being said, I doubt chromatic number is the right measure for software dependency graphs. Especially when there are hundreds of graph complexity metrics (tree-width, centrality measures, measures based on cuts and flows, etc.). Even just now for the first time I saw this thing called cyclomatic complexity: https://en.wikipedia.org/wiki/Cyclomatic_complexity

I'd be interested to hear from someone who has experience applying graph metrics to static code analysis in practice. What's useful and informative?


Cyclomatic Complexity is pretty old: as an Analyst on a large system (99.99% COBOL) I had to test "each linearly independent path through the program" in 1989, and had already achieved "old standard" status.


Excellent thought piece - wish HN was filled with more original analogies like this I one.

This reminds me a bit of domain driven contexts (something just added to Elixir/Phoenix 1.3) - reducing the surface area of sub nodes from each other by exposing that single point interface.


I’ve been wondering if I can do away with the web layer all together and provide RPC directly to certain contexts fairly trivially over web sockets. REST feels really tired when working with state containers IMO.


This seemed very promising, protobuffers to automated typescript over gRPC.

https://github.com/improbable-eng/grpc-web


Quick addendum: although the four color theorem proof always seems to require a case analysis at some point, proving that planar graphs admit 5-colorings can be done with a very short proof. Not particularly relevant to the analogy in the post, but if you want proof that planar graphs admit constant-sized colorings that's the one for you.


Just to advertise this addendum a bit more!

The Kempe-chain proof of the 5-colouring of any planar graph is easy and beautiful.


I thougut this article was going to be about Appel and Haken's proof of the Four Color Theorem, as it was the first to use software and not an actual analytical proof. As a result mathematicians were split: while this proof may have shown the theorem was true, it didn't explain why it was true. Because the latter was reduced to a brute force search.

https://math.stackexchange.com/questions/23409/did-the-appel...


For anyone interested, there's a really excellent book on the history of the problem as well as the AH proof, going into quite a lot of detail. Very highly recommend.

"Four Colors Suffice: How the Map Problem Was Solved" by Robin Wilson.

https://www.goodreads.com/book/show/450635.Four_Colors_Suffi...


I can't help thinking that it should then be possible to write a tool that ingests a codebase and shows you a graph similar to the ones in this article. Perhaps that would be a good way to figure out where to start working on reducing the code complexity. But I have to think on this for a while to see if I still believe it after mulling it over.


Ages ago (back when Python 2.5 was still a thing) I used a lib that would generate a call graph for Python programs. I think it was this one:

  http://pycallgraph.slowchop.com/en/master/
I don't remember much, but I think this wasn't terribly useful at the time, because it became very messy very quickly. Any moderately complicated application will be calling a dozen functions, and the graph then becomes a gigantic mess of interconnected nodes which is very hard to read.

But like I said, it was ages ago, so I don't know how good/bad it is now.


For Python there is Snakefood. I have had good results using it on a huge hairball project to discover interesting and useful things about the topology of the module dependencies.

http://furius.ca/snakefood/

http://furius.ca/snakefood/doc/snakefood-doc.html


Sure that's possible. I saw Matt Mackall do it for Mercurial first:

https://www.selenic.com/blog/?p=626

In rust, there's cargo-graph to do this for your packages:

https://github.com/kbknapp/cargo-graph

I used that on Servo once see PNG at https://dirkjan.ochtman.nl/files/servo-graph.png or SVG at https://dirkjan.ochtman.nl/files/servo-graph.svg.

From these explorations, I think the four color theorem-type planarity or connectedness is actually less important than a lack of cycles for the perceived complexity, or the difficulty encountered due to that complexity.



yea, that's been one of my goals. I would like to visualize the graph over time to see how the complexity grows.


It's not obvious, but the graphs on this page are interactive. You can drag the nodes around, and click them to change their colors.


I guess that would explain the insanity surrounding the node labels. Each number has position:fixed and a coordinate relative to the scroll position of the viewport. They update their position slightly out of sync with the scrolling motion, which is extremely unsettling to watch. Seems almost designed to induce motion sickness.


I think if you want to represent dependencies by a graph, it should be a directed graph, not an undirected graph.

I think once you put in direction, you will see that cycles are much more important than overlaps in edges in determining complexity. If you can keep the graph acyclic, life is pretty easy. I think big cycles are worse than small cycles, cycles that opverlap each other make things much more complex.


I wonder whether it would be possible to look at dependencies the way McCabe looked at flow control.

http://www.literateprogramming.com/mccabe.pdf

That is, we use a strategy for substituting away subgraphs; we can classify programs by what results from that process. It may turn out that there are particular problematic subgraphs to avoid.


It's also very helpful in discovering the best spots to test your code : https://blog.chewxy.com/2017/01/04/what-to-test/


In the context of HOTT? That most code in the real world can be embedded in low dimensions, and when you embed something on a surface of small genus you can do a bunch of optimizations.

Also, that you can use the finite set of minimal graph minors for a surface of genus X to do case analysis instead of having an infinite amount of cases to test with.


Interesting analogy.

When I was told about this many years ago, it was described as the 'n factorial problem'. This problem assumes that directionality of the communication is important. If you have two nodes, they are 2! ways to communicate. If you have three things, there are 3! or 6 ways to communicate between the nodes. The suggested solution was similar, group components together behind a black box facade so that 30 components can be reduced to 3! main communication paths or less.


> The first thing that struck me, as a software engineer, is that four colors is not very many.

I really like the visualization. It's a happy medium between something like UML and something that's actually useful, and creates useful constraints around communication channels between classes.

I'm sure there's some design pattern that breaks the rule, but overall trying to force a "4 color" programming pattern seems like it can't do anything but help.


This reminds me of the Complexity Principle, as stated by Dan Ingalls who was responsible for much of Smalltalk ...

"If any part of a system depends on the internals of another part, then complexity increases as the square of the size of the system."

A link to a slide from him showing n2 dependencies - http://bit.ly/2t0xukK


If you were weirdly interested in the four color theorem, you might like this common coding interview question about graph coloring: https://www.interviewcake.com/question/graph-coloring


That's a rather nasty interview question, though. If you've never heard of the problem before, writing the greedy algorithm isn't too bad. But it is actually more challenging if you are familiar with graph colouring, as knowing that it's NP-complete might be enough to stop one from even considering this to be an edge case...

If you really want your interviewees to hate you, ask them to solve it using D colours!


The problem description doesn’t specify “planar”, and K4 has degree 3 for each of its four vertices, but (obviously) requires 4 colors (and K5 has maximum degree 4, but requires 5 colors, etc.)

Also, K3, K2 and K1 are planar with maximum degree 2, 1, respectively 0, but require 3, 2, 1 colors.

So, in general, D colors isn’t sufficient.


Did he missed a connection between the top red blob and the green one?

Also, what's the name for the process for transforming a "map" into a connected network, just like he did at the beginning?


How do you color things like a method returning a un/ordered list which you are depending on? There’s no code to ”color”?


Shouldn't there be a connection between the top-right red circle and green circle in that very first graph?


Also known as tight vs loose coupling.


This is a nice analogy, but these days I feel like abstraction is usually a bad thing. By its very nature, making things more abstract makes it less clear what you are actually doing, you end up going through all these layers of indirection. If I want to use the data from some node (aka object) way over there, I should just be able to use it, without it getting pulled through all of these other things first.

To use the same node graph analogy, when you focus too much on reducing the number of colors, you end up needing to create tons of nodes between the nodes that actually DO SOMETHING. Your software ends up more complex and difficult to manage.


By its very nature, abstraction separates the incidental from the fundamental (where "incidental" and "fundamental" are defined on a per-abstraction basis).

If the problem you're trying to solve focuses on the fundamentals of the abstraction, you're most certainly gaining clarity (possibly at the expense of performance).

If you find the abstraction's making you lose clarity, it might very well be that you're really working on incidental complexity, but there's also a distinct chance you just chose the wrong abstraction.


That makes sense in a perfect world where the abstractions are perfect, but they rarely are. We make assumptions about the abstractions that often turn out to be false. Obviously a certain level of abstraction is necessary, but in general modern software development goes way too far with it.


Is your comment supposed to be an example of how abstraction is a bad thing? I have no idea what you're talking about.


My comment makes sense in the context of the original article, if you didn't bother to read that, then I can't help you.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: