I've seen excessive abstraction kill projects time and time again. The crime scene is similar in every case:
- What if we ever need to change database server or driver? Proceeds to write a layer to abstract data access
- We might have different login forms one day? LoginFormFactory it is
- This code hits our KV Redis/memcache/NoSQL by calling the driver lib directly? Can't have that, I'll write a CacheStorage or DocumentStorage layer.
Over time, this defensive mentality produces codebases that are so bloated that no one in the team can confortably grasp all the moving parts despite the project not being rocket science.
At that point, devs rather quit or get a substantial raise in order to continue working and ultimately endup leaving to start a new project. But this time, they'll be sure to not make the same mistakes. This time, they'll abstract even more so "when one part of the project becomes bloated, it can be easily rewritten thanks to the abstractions". And the cycle continues.
Abstraction isn’t always applied for rewrite flexibility reasons, in fact I claim it rarely is. Often it is used primarily as a way to decompose complexity: I want to write this component without deeplh considering specific interdependencies so I’ll just abstract those away and get on with it. The code will only be composed with specific things, but considering all those interactions up front is a PITA.
Overly abstract code could just be a sign of a programmer working through a problem that was not well understood (either by everyone or just themselves). The second or third time they do the same task, the code is bound to be less abstract because they are more comfortable in dealing with interdependencies up front. Suggesting refactoring or rewrites can work well in that case.
Which is not to say that we shouldn't wrap dependencies. We absolutely should. But by default, the wrapper should not offer an abstraction that is so flexible that the dependency can be exchanged. Instead, it should start with the actual problem, by offering only abstractions that (a) make sense in the context of the program and (b) translate well to concepts of the dependency/library.
For example, hiding POSIX-style select/poll vs Win32 Completion ports behind a common abstraction is something I tried recently, and seems to be HARD. Did not complete. Why? I started with the abstraction without being familiar with what the problem was. All that I knew was that I wanted to experiment with "game engine design".
Even then, often there will be small bits that don't fit well. By wrapping everything you make the model and assumptions of your application explicit. Code that is written against your own model (which is almost always simpler than what the library offers) will be more understandable.
I really like the Dijkstra quote, "The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise."
Usually I even wrap libc and POSIX functionality in my C programs. Advantages include (1) not having to mess with specific types, such as size_t, time_t, DIR*, and so on -- which often don't fit my own model (e.g. dealing with unsigned size_t vs signed integers is painful), and often need small fixes to be portable. (2) compiles a lot faster since I don't need to include 10000s of lines from system header files (3) not wrapping makes huge portions of the project dependent on the library, instead of only the wrapper. (4) You tightly control access to the library, which makes it easier to troubleshoot. You can also make up your own strategy for managing library objects in a central place.
I think this is nonsense and leads to enterprise-y type patterns and hard to understand code.
Wrapping things for the sake of it is a recipe for unnecessary verbosity and a hard to learn codebase as things are obfuscated and take longer to trace through.
In C, your point 1 makes me think your code is unsafe, if you're having trouble keeping track of types and signedness. Point 2 is fair enough, but I think a lot of C headers are granular enough that it's irrelevant, especially given how fast everything is these days. Gone are the 8 hour builds of old.
Point 3 is precisely the type of premature interface-isation that's the problem here. If you're never going to change it, then it doesn't matter whether huge portions of the code rely on the library or your wrapper. It's basically the same thing. Only you've put in extra work to wrap something in another layer for no gain. 4 much the same.
Hmm, I think I should put that in context. I'm not saying write a wrapper for the sake of it. I don't wrap everything in three layers of OO giftwrap. I'm saying one should follow basic rules of hygiene.
For example, I have a FreeType backend in my code, and I have a single file which implements a font interface (open font file, choose font size, draw to texture in the format of my app) by interfacing with Freetype. It returns my own error values, prints warnings with my own logging framework, puts the data in my own datastructures, etc. All things that Freetype could never do since it does not know my project.
I have an OpenGL backend, and I make sure that my geometry and math stuff, UI drawing code, and what not, does not depend on OpenGL, or OpenGL types. (Did that once when I learned OpenGL, and it was terrible). So now I have basically one file which is dependent on OpenGL, and it's there to simply put my internal datastructures on the screen, in a very straightforward way.
Same goes for my windowing backend. I use GLFW currently but I make sure I don't use e.g. GLFW_KEY_A in client code. I define my own enums and own abstractions. It does not come with any cost to simply have my own KEY_A, which means I can swap out the backend relatively easily, and can change my model more easily (for example, make up my own controls abstraction instead of relying on a standard PC 104 keys keyboard, etc).
> In C, your point 1 makes me think your code is unsafe, if you're having trouble keeping track of types and signedness.
No, it's safer precisely because I have a well-defined boundary where values get converted. I can check for overflows there, and not worry about the rest of my code.
There is no practical way to deal with size_t in every little place when normal values come as int.
> Point 2 is fair enough, but I think a lot of C headers are granular enough that it's irrelevant,
You bet. Have you ever tried to access parts of the Win32 API? You're immediately in for (I think it was) 30K lines of code even if you define WIN32_LEAN_AND_MEAN. Otherwise it's maybe more like 60K. It's crazy. Or check out GLEW (which I don't use it anymore). It generates 27000 lines of header boilerplate simply to access the OpenGL API.
Add to that that most headers have a standard #ifdef include guard, which means that while files get ignored the second time, the lexer has to parse the whole thing over and over again.
> If you're never going to change it
How would you know?
> then it doesn't matter whether huge portions of the code rely on the library or your wrapper.
It does matter a lot if there is a semantic mismatch.
You have a target system, and no current plans to change it. You can't predict every change that might happen down, and neither should you try - you'll waste a ton of engineering effort and if there is change, it's almost always change in ways you didn't anticipate.
I absolutely agree with your statement, but I don't think it applies in this situation. I'm not trying to predict future changes. It's the opposite. I'm codifying the current state. I make sure that there are no misconceptions about the extent to which the library is used.
Yes, a library is already an abstraction, just somebody else's. Thread parent is complaining about bad design that doesn't model the problem, not abstraction.
There's case for having a "convenience" or "easy mode" wrapper where the amount of configuration is lower, hence the underlying dependency is more straightforward to reuse.
For example, I would point to 3D graphics APIs as a reasonable case for this. All the ones in use today assume a lot of low-level detail about buffers and pointers and strides and attribute flags, which makes building up test cases challenging: you have to consider dozens to hundreds of options, and setting one wrongly will result in no image or a crash.
So, instead, many folks turn to copy-paste of known working examples, give them a minimal amount of additional configuration, and extend that into the frontend that they habitually use, rather than directly accessing the API in question. The full power is still there - it's not really abstracted - but the workflow has been pushed towards the average case.
I wouldn't necessarily consider that a wrapper, personally, but a reasonable set of functionality behind an interface to said functionality. The only thing I'm really arguing with is the insistence that one should wrap everything as a matter of course.
Reduction of complexity, like your example, is great. Increasing levels of indirection thoughtlessly is adding to it, IMHO.
I got the impression you were saying every dependency should always be wrapped, regardless.
I was a C coder for many years, these days I seem to be doing java. There are dependencies in my recent projects that just do what I need, no particular domain translation required. Particularly things like the Apache commons libraries, which provide well formed utilities for common operations. It would be a waste of time and energy to wrap them simply for the sake of having a wrapper.
If this sort of thing isn't what you were driving at, then we've just been miscommunicating. I am 100% for encapsulation of functionality into good, discrete modules which provide sensible interfaces and minimal (but expressive APIs). I just don't like the blind application of "this isn't our code therefore we must provide an interface"
Dependencies should be treated like if it was your own code. After you have replaced some API for the third time you can start thinking about wrapping it.
Or if you start with multiple wrapped dependencies from the start, the library is frequently updated and/or you touch the library in many places in your code across multiple projects especially, those are good places for wrappers/facades/adapters.
An example would be in games when you submit an achievement or leaderboard score to a third party such as Apple GameCenter, Google Play Game Services, or other systems like Amazon Game Circle, web or custom. These libraries are frequently updated, they change per platform and you will touch them many places in the codebase, so wrapping them in a progression/statistics/achievements wrapper/facade/adapter is smart from the start.
Even if you have wrapped it you still have to update your code when there are breaking changes. If you follow "don't repeat yourself" (DRY) there will not really be more work updating, compared to if you had wrapped it. (You should however not follow DRY slavishly - but that is a discussion on it's own.)
If the wrapper/library is generic enough then it can be less work across projects and codebase and won't need to change with an actively updated library that is wrapped.
If the abstraction wrapper is tightly coupled, uses third party library types that can change or leaky then yeah, the effort is moot and you end up with more work.
Another case is where you have some sort of messaging system where you want responses/events to be uniform to your system rather than unique to a platform or even a company/product that doesn't match your codebase. An example might be wrapping a game recording library, audio library or something that does not fit well in the codebase or complicates maintenance, even stylistically/standards, or you only need small part of the library like activating it, common types messaging or data level.
Game engines themselves are massive wrappers around many wrapped systems.
> Dependencies should be treated like if it was your own code
Only a Node.js developer could say that. Maybe it's true when the dependency is left-pad or something.
Experienced developers in other languages will acknowledge that (non-trivial) code that is not your own code, is not your own code. You seldom have a very good understanding of the underlying models, and the library has absolutely no understanding of your own model. And it's very hard to make a change in code that is not your own.
Case in point, I just read the title "Linux fsync() issue..." on the HN frontpage. If you have a large project, say a database, I can only hope that you properly abstracted your synchronization model.
If you find an issue in a dependency it's better to fix the dependency, assuming it's open source, rather then wrapping it. Then everyone using it will profit, and don't have to write wrappers.
You can abstract the platform layer. It's a lot of work, but might be less work then maintaining a code-base for each platform.
If you are working with a very low level API it might be a good idea to abstract it though. You always want to go up one abstraction level, not abstract "sideways". And be aware of the trade-offs like performance, and also being able to understand the code. for example: Developers might know the low level API because it's common knowledge, if you then have made a (leaky) abstraction it might make it much harder to understand.
You cannot fix the dependency. It's not broken. Libraries are not designed just for your application. That's the point of libraries. And that's why you need a thin layer that translates from the library's less concrete and more complex model to your application's more concrete and less complex model.
> You always want to go up one abstraction level
I don't think there is such a thing as "abstraction levels". As it says in the Dijkstra quote, abstractions are semantic models. Interfacing means translating between models.
Translations between abstractions might or might not be fully realizable. Very often there's a mismatch and the translation is not possible perfectly, in which case it's a leaky translation. And that's ok. Often the only way to deal with reality is to ignore some difficult parts of it, since otherwise the project couldn't be completed. For example, RPC is an abstraction for network requests to be modelled as function calls. This ignores the reality of the unreliability, throughput, and latency, of real world networks.
And that's ok. There might be some situations in which the program does not work in the real world. But mathematically speaking, at least the program is correct (in a very obvious way) with regards to the simpler model which, unrealistically, assumes that RPC works just like function calls.
So, RPC is not an abstraction that is somehow built on top of network infrastructure abstractions. It's only typically translated to the semantic world of networks. That's an important difference.
>I've seen excessive abstraction kill projects time and time again
Agreed.
But none of your first few examples seem particularly egregious, if done well, they don't seem like they would sink a project.
I think part of the problem is developers underestimating just how hard it is to write an abstraction layer well. The juniors will say, "yeah I could do that in one sprint" then they start to build on top of it, the problems don't start to show until later, maybe when the dev who wrote it is gone.
I think the correct way to approach a problem like this is to acknowledge the cost of it up front. They are expensive. They don't just have to work. They have to work well and be readable.
I confess I've never seen that happen, but I do think excessive abstraction is killing the argument this article is trying to make.
When would anyone ever make an inc_pair function? Or a hierarchy of 'animals that lay eggs'? I guess these are supposed to be examples of something, but I'm not sure what. I wouldn't even call them "abstraction". They seem like completely hypothetical examples of how to apply programming language features to non-problems in bad ways.
Without knowing what the designer of these programs is trying to model, it's impossible for me to understand what they're trying to accomplish or why they think applying these language features in this way makes sense at all.
Both of these examples sound like they increase the number of lines of code, for no appreciable benefit, and I recall Yegge's old observation that the main issue with any codebase is simply the bulk of it.
On the other hand, the lens library in haskell is extremely abstract. Libraries obviously have different tradeoffs but using lens can increase the onboarding cost significantly for new team members.
I am mentioning lens because it has a both function:
incBoth = both += 1
Which has a fun type signature:
incBoth :: (Bitraversal p, MonadState (p a a) m, Num a) => m ()
To me the value of abstractjng out those services is testability - if they're abstracted, they can be mocked. The smaller the abstraction, the smaller the surface area you have to mock.
The smaller the abstraction, the more complex the emergent behaviour is when you combine all the itty-bitty little abstractions together, and the harder it is to write tests that cover real use cases instead of testing implementation details, which is another way of saying "your tests will be brittle".
You can band-aid this to an extent with higher-level "integration" tests, but if you'd done things at the right level of abstraction in the first place you would carry less weight around and wouldn't have to maintain a bunch of brittle tests in the first place.
This is obviously all shades of grey, but if you're mocking out things that aren't I/O you're probably doing it wrong.
If you find yourself vehemently disagreeing with this I'd be interested to know if you've ever had to refactor or simplify a codebase with a bunch of overly-abstracted, itty-bitty things that had very tight coupling via mock behaviour to all their tests, and if so, whether that felt pleasant to you or not. If you haven't then you probably haven't seen the considerable longterm maintenance downsides to this kind of approach and I feel sorry for the poor folk who will inherit your codebase.
Also curious is that many codebases I come across that look like this often have terrible copy/paste mock setup across lots of tests, making the issue even worse when you want to change things.
Those sorts of codebases often end up with people wrapping the abstractions in other abstractions because they're sufficiently resistant to change as a result that that seems easier. This obviously makes everything even more resistant to change (especially as the wrapper abstraction usually depends deeply on all the behaviour underneath it, and the mock set-up to test the wrapper ends up as an exercise in mentally mismodelling how the other components actually behave).
>The smaller the abstraction, the more complex the emergent behaviour is when you combine all the itty-bitty little abstractions together
That's not necessarily true. Abstractions with a small surface area (exposure to their 'outside world' - e.g. via function signatures) that are very deep (they hide a lot of complexity) make more complex behaviour much easier to manage.
When the surface area is high and the depth is low is when the overhead of the abstraction tends to exceed its use value.
Abstraction is useful when the resulting code is logically simpler, much like say category theory makes certain results in algebraic topology seem obvious.
Perhaps, you simply mean a different thing when you say "abstraction".
I find that there's two core points that help me figure out where I'm missing some form of abstraction: the body of a function should only operate on a single level of abstraction, and it should only know technical details about one thing.
If you're writing some database code, having high-level fetchMyEntity() calls mixed with connection/resultset/cursor logic is bad news.
If you're writing something that reads from a message queue and stores the message in a database, the place where the two meet shouldn't know much of anything about either the message queue or the database.
Obviously, exceptions exist where these rules need to be broken, but I find they're fairly few and far between.
The "Simple Made Easy" talk describes this as the root of complexity in an application. Mixing and tying various subcomponents that shouldn't have to know each other exist.
> Mixing and tying various subcomponents that shouldn't have to know each other exist.
And this is the real trap when dealing with abstractions - maybe the implementations of two different operations are the same, but if the semantics are different then de-duplicating them creates an unnecessary dependency between the two. The more cross-links in your application structure, the harder it is to do anything without breaking everything.
> And this is the real trap when dealing with abstractions - maybe the implementations of two different operations are the same, but if the semantics are different then de-duplicating them creates an unnecessary dependency between the two.
One of the truly remarkable things about Haskell is that its approach to structural abstractions allows you to break out of this problem (though you end up paying a different cost for it)
Yes. I look for something I can measure when I make this decision. Will it result in less code? Will it improve performance? Will users/customers care?
Lately I've been thinking about the similarities between the abstraction that results from post-hoc software refactoring and post-hoc mathematical proof "refactoring". In both cases the refactoring is an attempt towards a more ideal form, but that form is almost always more impenetrable to newcomers.
E.g., a quote about the mathematician Carl Gauss:
> Gauss' writing style was terse, polished, and devoid of motivation. Abel said, `He is like the fox, who effaces his tracks in the sand with his tail'. Gauss, in defense of his style, said, `no self-respecting architect leaves the scaffolding in place after completing the building'.
There's a difference between deduplication/extraction and abstraction which often seems to get lost. Abstraction is not about the mechanical reorganization of code. It is the structuring the code to follow a natural/logical abstraction that exists outside of the code. The first clue is the name of the abstraction. If it describes how the 'abstraction' works or what is going on inside it, then it's best to leave it. An abstraction should be able to opaquely represent what it is. This is the value of abstractions, it let's you not think about what's inside while working at a higher level.
A similar issue I have is with people constantly 'refactoring'. I choose to say I'm 'factoring' code instead. If you can't name the factors that you're separating then it's likely you'll change your mind an end up 'refactoring' it. Sometimes you take factored code and factor it further, which I don't have a different name for, just more factoring.
Introducing an abstraction is a way of extending the "base" language into the domain of the problem being solved. Viewed in this way, creating an inc_pair() function does not
make sense beyond applications that deal with incrementing stuff. On the other hand, if we are moving a player on a grid, move_diagonaly_up() makes sense and is worth introducing into the extended language vocabulary.
Abstractions are of course due when time is right.
It's difficult to address but imo not be the goal from the get go, to introduce abstractions to a codebase.
There is also a cultural thing around abstractions where inexperienced developers look up to or are fascinated with the complexity of something or someone that brings that to the table.
It's also a common feature of certain so called architects, because they probably feel they need to some advanced techniques.
The thing is though, that when you have worked with developers or architects that advocates for simple abstractions, and it over time proves that is both efficient and cheap, then you start to doubt complexity in total.
Also remember that complexity is often not complex per se, as long as you spend time on breaking that complexity down.
And then you have a better fundamental platform for solving what you need to.
Simple code is fast code. And also easier to change.
Very insightful. I like thinking about software complexity and one of the concerns there to deal with complex software is that the design and intentions should be communicated (which means either documentation or exist as a common understanding of purpose and function).
From your perspective, it means that there is also a need to establish agreements on the levels, depth and ways abstractions in the code are formed. Indeed, I worked with software where the functions and operations weren't implemented in a messy way per sé, but the many levels of indirection, abstraction (and obscurement) made things just really difficult to read and a real tail-chaser when it came to maintenance.
Those levels can also make it much more difficult to understand the flow and the operations that are happening, because in many languages you pass references to data objects, so data gets changed in many ways.
Nice article, puts me into thinking mode again! :)
I totally agree that the cost of abstraction should become more important. It is often the number one cause of frustration when trying to add a new feature to an existing code base.
Building a mental model of any code base takes time. Abstractions for the sake of abstraction makes it harder to grasp.
Abstraction should be based on whether the abstraction makes it easier or harder to reason about: to see, to predict consequences of decisions, to diagnose causes of behaviour.
Usually, the established abstractions of the domain are an excellent guide. Even if you think they are sub-optimal objectively, they are easier to reason about for experts in that field.
I like the article's language of comtaining the "damage" code can do.
A C-style modularizarion technique I haven't seen used is long functions, with sections' variables' visibility limited by braces.
You are exactly right. It's indirection without abstraction. Good abstractions are simpler than the sum of their parts, the author's example of abstraction is clearly far more complicated (templates, references) than two increment operations. Good abstractions are ones you don't even think about, like arithmetic operators. How often do non-embedded/non-OS developers think about how multiplication is implemented?
Quoting from the blog post:
>It also seems that Go's implicit interfaces were designed to avoid unnecessary abstraction.
Actually, interfaces are often extremely abstract in Go. How much more abstract does it get than io.Reader? It's a thing that you can read bytes from into slices (arrays). The io.Reader abstraction is far simpler than os.File, net.Conn, or even bytes.Buffer (a file object, a network connection, and an in-memory buffer, respectively).
He shows an example of indirection without abstraction, then blames abstraction. His point isn't wrong, the problem is that he's confusing abstraction with any form of indirection. He claims that abstractions increase cognitive load when in fact the opposite is true. Bad abstractions and indirection without abstraction increase cognitive load.
I agree with him that abstractions should be crafted carefully, but not because abstractions are categorically bad. If he thinks abstraction is so bad, why does he write software at all? Software is extremely abstract, even assembly language is an abstraction of what the CPU does. Human language is abstract too: we can talk about trees without having to consider any specific tree.
Abstraction is a fundamental building block of human civilization. Of course, he doesn't actually believe that abstractions are bad. It would be nice if he differentiated between good abstraction and bad abstraction.
On a side note: I take issue with his criticism of mocking. Good abstractions are easily mocked and make the code base easier to understand because you don't have to consider every detail of the application at the same time. On the other hand, mocking concrete objects,[0] rather than abstract roles, definitely complicates things.
But please don't. Doing that adds a cost to the community, by toxifying it. Because of that, your pro-community intention is not only not fulfilled, it's actively damaged.
I find facile dismissals irritating as well, and lord knows this site gets a lot of them. But the way to push back is with a clear, positive defense of whatever was unfairly dismissed. Venting doesn't help; it only invites more venting.
I think its better to give feedback explicitly rather than sneering. The latter just feels bad to read (even as a third party) but doesn't really espouse some better active social norms to follow.
> to add a cost (the possibility of being dismissed/sneered back) to such dismissals.
Given the fact that this is a conversation among strangers, I would assert that it isn't really that effective to just add costs by making discourse less pleasant.
--
In general, I think a community is healthier when we treat people 25% better than you expect to be treated, to account for the Fundamental Attribution Error and other misinterpretations.
>I think its better to give feedback explicitly rather than sneering. The latter just feels bad to read (even as a third party) but doesn't really espouse some better active social norms to follow.
I guess so. Sometimes I'm just pissed from the easy dismissal, as in "This 5 second basic retort is all you've came up with, and you think you've taken down TFA?".
This whole pointless sub-discussion has had a small cost for me because for some reason I bothered reading it. A simple downvote of the original comment could have provided feedback to the author while sparing the rest of us.
These abstraction discussions seem to always result in commenters talking over each other. I would love to have these conversations rooted in a code sample. Otherwise no one is talking about the same thing.
If someone writes a blog series called “should this be abstracted, what’s the abstraction?” I think we would see some great discussion.
Structuring 1M line of code as one function or 1M functions is clearly equally absurd. What is the sweet spot? There must be an answer from psychology and/or information theory.
The answer is: What is the problem? It depends on the problem.
We're unlikely to encounter a problem that is solved with a single funtion, or as one function per line. But the important thing is to look at what the code should achieve. Most importantly, local decisions should be made based on what the code should achieve in a global context. Local syntactic optimizations are less important than the global picture.
I strongly agree with the Golang authors and with experienced C programmers that it's much better to write a few lines more and be more clear and explicit in exchange. Note that there are some additional lines that increase complexity, but I would argue that lines which contribute to clarity do not contribute complexity. In fact, those investments in additional lines usually decrease the number of moving parts.
Syntactic homogeneity is important so one can easily see what one piece of code achieves in the global context. It does not help if we micro-manage and constantly think about the type of for loop or lambda abstraction or error handling mechanism to use, only to shave another line off.
Unfortunately looking at the actual problem is what most people forget. Instead the discussion are about languages (filter, maaap. maaaaap), frameworks, libraries, object orientation without any concern of what these features can do towards reaching a specific goal.
Now that you mention information theory, I want to mention the term "Semantic compression", created by Casey Muratori. I think he has done at least one stream about it, which you should be able to find on YouTube. In general I recommend to follow him. He is one of the most experienced and no-nonsense guys I've found on the internet.
Keep things 'square'. This is a fuzzy concept and I don't really know how to explain it properly, other than that the effort spent on each layer of abstraction should be roughly equivalent.
Your example is a single 1m-line function or 1m one-line functions. In this case you probably want 100 functions of 100 lines each (and yes, refactoring like this you probably save 100-fold in overall LoC so it works out.) And your 100 functions are probably nested in a 10-deep hierarchy where they all do roughly the same amount of cognitive work.
- What if we ever need to change database server or driver? Proceeds to write a layer to abstract data access
- We might have different login forms one day? LoginFormFactory it is
- This code hits our KV Redis/memcache/NoSQL by calling the driver lib directly? Can't have that, I'll write a CacheStorage or DocumentStorage layer.
Over time, this defensive mentality produces codebases that are so bloated that no one in the team can confortably grasp all the moving parts despite the project not being rocket science.
At that point, devs rather quit or get a substantial raise in order to continue working and ultimately endup leaving to start a new project. But this time, they'll be sure to not make the same mistakes. This time, they'll abstract even more so "when one part of the project becomes bloated, it can be easily rewritten thanks to the abstractions". And the cycle continues.