Hacker News new | past | comments | ask | show | jobs | submit login
The Law of Leaky Abstractions (github.com/dwmkerr)
177 points by dwmkerr on May 14, 2019 | hide | past | favorite | 102 comments



I see this fact used pretty often to dismiss the idea that we should use abstractions at all, and I think that's pretty wrongheaded. As an example, it's pretty common to hear that people should prefer writing raw SQL to using an ORM or query builder because those are both "leaky abstractions"

Abstractions aren't necessarily only there so that you don't need to understand anything about what's being abstracted. They are there so that your code can get away from nitty-gritty implementation details, and be more focused on the problem domain.

What would you rather see when you're coming into a new codebase:

    const formData = new FormData();
    formData.append('name', 'Widget');
    fetch(`/api/widgets/${id}`, {
       method: 'POST',
       body: formData
    })
or:

    Widget.save({ name: 'Widget' });


The argument against ORMs aren't that they are a leaky abstraction (although they most definitely are).

The argument is although they appear to offer you value up front, they cost you much more down the road. The moment you have a hot path query that needs optimizing, you are dropping down to your ORM's "raw sql" mode. Then you do it again. Then again. Then you are ripping out the ORM and spending cycles refactoring it out of your code and replacing it with simpler abstractions.

I always find the people who don't believe writing raw SQL is preferable to using an ORM is usually down to a lack of experience or because they have been coerced to use ORMs by their "enterprise grade" language (usually your C# developers of the world, no offense to you all but if every time you look up examples of data operations it's dealing with EntityFramework you are probably going to wind up with a lot of devs using EntityFramework). The ORMers, as I call them, don't have confidence in their own SQL ability, and they haven't experienced the aforementioned situation enough times to realize you are better off starting with SQL to begin with.

The realization is, if I could summarize it: yes all abstractions are at least somewhat leaky, so you are better to use simpler ones than complex ones (and if you need something more complex, compose it from simpler ones)


> The argument against ORMs aren't that they are a leaky abstraction (although they most definitely are).

I'm not using an "OO" language these days, so I don't really have a horse in this race, but: isn't the problem you mention here completely explained by the leakiness?

Replace "ORM" with "compiler" and "SQL" with "assembly language", and this is exactly the argument I heard back in the 1980's about why not to use a compiler, or in the 1990's about why not to use a GC. Nobody I know is writing assembly language at all any more. It sucks at composition, so even though it's better at many tactical micro-optimizations, we've improved compilers to the point that we can use them all the time, because it helps productivity massively.

> and if you need something more complex, compose it from simpler ones

Composition is the killer feature in all of software building, and the fatal flaw of assembly, and SQL. We use higher-level languages for their composition abilities, even when they are less efficient. When their inefficiencies make them unusable, we make them smarter, even when it means designing new languages with 100+ keywords (ugh).

If the major complaint against ORMs is that they aren't good enough to make the value worth it in all cases, well, yeah, I've seen that phase of literally every other technology I'm using today. This, too, will pass.


The difference is that in my whole programming career I've never had a problem with a compiler.

I have spent so many hours battling Entity Framework when writing the exact query I need is absolutely trivial. It's a rabbit hole, and it can blow up deadlines unpredictably.

Next time I start a new project I'm just using straight Dapper.


Comparing SQL - a 4GL - to assembly language seems disingenuous at best, to the point where the Blub paradox applies. (And lo and behold, just four days ago you compared SQL to Lisp on the basis of incompatible implementation!)

Unlike assembly, which is famously redundant, SQL doesn't have much to abstract away, and the majority of counterexamples are covered by macros. The things that it really can't abstract are the implementation details that you want to care about when you optimize. Some parts could be a little less clunky, in the same way that the C language could be a little less clunky, but the successful track record of SQL over decades speaks volumes: it's not the stuff going on inside the database causing people the most grief, it's the programming goop that gets attached to it.


He does have a point about composability though. Wonder how, things would have turned out had prolog become the default query language.


This is a pretty condescending take, and doesn't really seem rooted in anything substantial.

The core of your argument revolves around the inevitability of needing to drop down to raw sql for performance -- care to elaborate on that? It seems overblown, I've worked on some apps at decent scale and optimized quite a few queries and I haven't encountered this pattern of needing to throw out the ORM altogether in more and more places.

For any simple to moderately complex queries, the ORM is generating the exact SQL you would write by hand... There are a few pitfalls -- by default you may be fetching more fields than you absolutely need, but this is fairly cheap as long as your rows aren't super wide, and any good ORM offers an easy way to pluck only specific fields that you can use in hot paths. It's often easy to end up with N+1 queries in an ORM, but that's also a well-known access pattern that you learn to avoid with a little bit of experience and doesn't require raw sql. This latter is one area that I think a lot of ORMs could really improve upon, in terms of making it harder to do the wrong thing for inexperienced folks. But it's hardly a reason to throw out the whole tool, you can write n+1 queries in raw sql too.

I would agree that very complex queries can be more tedious to express in an ORM syntax. A good rule of thumb for when it's worth dropping down to raw SQL from the ORM, is if the shape of the query results don't match an object shape anyway -- i.e. for complex analytics type queries that are doing aggregations. But again I don't see much of an argument for throwing out the whole ORM just because you want to write a small percentage of very complex queries in raw sql. Use the best tool for the job. I don't think there's really al performance argument for these types of queries either, a really complex query can still be just as slow written in raw SQL. The solution to scaling these kinds of queries in a hot path tends to be denormalizing or precomputing or caching parts of the data.


Agreed - For example in SQLAlchemy, you have:

1) Super high level - my_record_object.filter(whatever)

2) Common query api: my_table_object.select.where(whatever).limit(whatever) etc etc

3) Straight up SQL execution: db_engine_object.connection.execute(Sql query as string)

There are tools to safely interpolate arguments into SQL query templates for use in #3 too. What else could you want in the vast majority of applications?


I prefer ORMs to writing raw SQL in application logic - and that is not due to lack of experience with either. (I notice your criticism is mostly ad-hominems directed against people using ORM's rather than actual criticism of the technology.)

Note that SQL is also a leaky abstraction over the relational operations - it is just a question of which abstraction is most useful for a particular purpose. Linq express the relational operations just as directly as SQL does, and with a less clunky and more composable syntax.

If you are optimizing raw SQL for performance then you are breaking the SQL abstraction anyway. SQL is supposed to work on a logical level, and the query optimizer do the rest. I know it is sometimes necessary anyway, but that does not show a weakness in the ORM abstraction, it shows a weakness in the query optimizer.


I'd not mind writing raw SQL if it was anything other than a big, non-typed blob of crap as far as my language is concerned. I want compile time safety in my SQL. Tbh, I'm surprised that raw SQL folks don't promote some type of non-ORM but also compile-time type safe SQL implementation. The runtime-ness of SQL strings have always blown me away.

I imagine it would be pretty easy too. Move SQL out of the program language (ie, into files). Verify the syntax. Compare the SQL to the db to ensure validity. Bam, compile time verified SQL. Though, I've never used a setup like that, as that is what my ORM does, just in-language.


Nim has a library (ormin) that does this, although it is still far from production quality:

It parses your sql (in compile time), including the "create table" definition, and does type checking, rebuilds it with parameters (so that it is SQL injection safe and the DB optimizer can make a plan once and reuse it).

Obviously, it needs to be aware of specific SQL dialects if you need to use them .... so, it's unlikely to be very popular.


In Java I like to use JOOQ for this. It generates classes and constants from your database. Then you can use the library to build your queries using those generated constants and classes.

This gives you type safety and the ability to use IDE refactoring tools.


Nice. Yea in Rust I use Diesel for this as well. I imagine any ORM worth its weight basically handles this same thing. Even code generation from DB is basically a verification of the schema.


Your compiler doesn’t — and can’t — know what’s in your database.

To the extent you create a system that requires this constraint, you’ve created a brittle (and soon to be broken) system.

(I suppose there may be some case where all the data is all known up-front and will always be updated in lock-step with the consuming code/services but that’s rare in my experience.)


> Your compiler doesn’t — and can’t — know what’s in your database.

There's no reason you couldn't compile code against a schema, at least the portions that would be exposed to the application anyway (independent of the data content and non-exposed backing parts of the schema) just as you do against header files (independent of the implementation). It is a form of coupling, but its coupling that exists anyway in DB-consuming code, its just not typically statically verified and so is prone to unnecessary run-time breakage.

Unfortunately, you need tooling to statically analyze SQL schemas in whatever flavor of SQL you are using (and vendor differences will matter here), including inferring types through view definitions, etc., and then you need to tooling to map that to the type system of your implementation language, and potentially you need extension points so that you can do custom mapping for custom types from the database side.

Unless your DB and application platform share tight common control or are near ubiquitous, getting someone to make the investment to do this and keep it current is hard. Its not too hard to imagine Microsoft doing if for the SQL Server / .NET combo or Oracle doing it for Oracle DB / Java, but a general and maintained enough to be usable solution is harder to see getting the kind of support it would need.


Sure, you can validate your databases schema against types in your code. Nothing wrong with that either, as far as it goes.

But that isn't a guarantee of anything at runtime.

You could make a runtime requirement that the schema and code match, but you're going to pay a price for that tight coupling. E.g. you'll need to put in place a mechanism to be able to update all your database clients and all your database schemas atomically. Once you get a decent amount of data in your database or scale horizontally, this could become infeasible (e.g., due to the downtime to update the schema) or impossible (e.g., because you don't have complete control of all databases, data, and database clients).

If you are going to go in this direction, you're usually going to be better off targeting a mapping layer/API, not the base tables. The coupling is not so rigid (e.g. there's room for simultaneous support for multiple versions of the data access API/schema. This all exists, of course. Of course, ORMs and other higher level data access libraries generally do this kind of thing to a greater or lessor degree. Importantly, the mapping layer/data access API should live with the database. That is, be developed and deployed with the database (directly or in controlled parallel).


> Your compiler doesn’t — and can’t — know what’s in your database.

At compile time, yes it can. Not inherently the compiler itself, but at compile time the process can fail to build if your models do not match the schema in the DB. Diesel, for example, keeps the schema in code and (optionally) compares it to that of the DB. Ensuring that at compile to your code matches the schema of the DB.

That of course doesn't handle changes to the schema post-build, but I hope no one is ad-hoc modifying their DB :)

edit: And this becomes even more of a local, isolated solution if your ORM is also managing your migrations. Which, in the case of Diesel, it also (optionally) does.


Maybe I should have added: it doesn't know what's in your database at runtime.

It's not you'll be ad hoc modifying your DB. It's that your schema and your application-side entities will change over time (continuously, during development, and somewhere between continuously and periodically in production, depending on how you release). Generally you won't be able to guarantee that they are updated in lock-step, or you won't want to pay the price to ensure that they always are.


When ORM's work as expected, they are wonderful, but when they don't, you can waste a lot of time figuring out why because they are complex contraptions with lots of parts and rules. They are black boxes, or at least dark gray boxes such that if they don't produce the expected answer, your hair will turn light-gray trying to figure them out.

I would suggest using "sub-helpers", utilities that make using SQL and/or stored procedures easier, but don't hide all of the SQL when you don't want to. They automate the common grunt-work, but you are not forced to use them if they don't apply to a situation.

Come up with shop conventions for how and where to join, how to handle reference tables, etc. Then automate/abstract around these conventions. Tune based on lessons from the last application and they will improve over time. It's difficult to get abstractions right the first time.

In other words, wrap and automate the repetitious parts primarily. Use abstraction where it works best and don't use it where it doesn't.


> They are black boxes, or at least dark gray boxes such that if they don't produce the expected answer, your hair will turn light-gray trying to figure them out.

I'd love to hear specifics about some of your personal experiences like this if you're willing to share.


> The argument is although they appear to offer you value up front, they cost you much more down the road.

I always enjoyed this[0][1] take on that sentiment.

“Although it may seem trite to say it, Object/Relational Mapping is the Vietnam of Computer Science. It represents a quagmire which starts well, gets more complicated as time passes, and before long entraps its users in a commitment that has no clear demarcation point, no clear win conditions, and no clear exit strategy.”

[0] https://blog.codinghorror.com/object-relational-mapping-is-t...

[1] https://web.archive.org/web/20180118171352/http://blogs.tedn...


This argument against ORM is like just because I need to drop down to assembly to get maximum speed in a few cases I should throw out all the C code and redo the whole application in assembly.


It's more like writing a Python framework, which dynamically produces Ruby code which is executed, then the results are returned back to Python. Sub out Python and Ruby for any other pair of languages. At some point, your generator framework must be at least as complex as the underlying language in order to produce the same functionality.

ORMs seems fine for SQL, because on the surface, SQL appears to be a simple language. But that image falls apart the minute you try and use some (relatively) esoteric, but important language feature.


> But that image falls apart the minute you try and use some (relatively) esoteric, but important language feature.

No, it doesn't. The only people I hear claim this are people that don't have decent experience with ORMs. Mature ORMs also act as data mappers when you need more fine-tuned control.


Yes, just because I need to drop down to SQL to use the few esoteric feature in the 1% case doesn't mean I cannot use ORM to make my life easier for 99% of the case.

I just don't see the point of the argument against ORM to have 100% SQL coverage or else. Most ORM's allow dropping down to SQL when needed. It's easy to have the best of both worlds.


No, it’s actually not like that at all...


SQL itself is a leaky abstraction. You're essentially hacking a sql expression and permuting it in different ways so it executes a specific imperative search algorithm.

AN ORM is like a backwards abstraction. SQL abstracts imperative instructions into an expression. An ORM abstracts a SQL expression into imperative object oriented instructions. I get the reasoning though. It's to keep the developer from having to write foreign code within code.

Sometimes, the ORM abstraction just isn't expressive enough. So I have this idea for an ERM. Expression relational mapper. Basically it compiles an expression based language into ORM methods. So like "SELECT * FROM TABLE" becomes table.all().


This is how Linq in .net works.


By now, we should be able to transition between ORM and raw SQL code without anyone batting an eye. It's not hard at all. You can't convince me people are actually worried about an extra ORM dependency these days when we have node_modules.

I would much, much rather junior devs use an ORM to start simply to prevent SQL injection while they skill up on SQL itself. But there's not many companies who will hire you and give you a month to learn SQL from the ground-up and do nothing else (learning SQL is NOT like learning another compiled/interpreted language -- there's new concepts plus the DBMS's own language idiosyncrasies which you're definitely using in prod).


This is reflective of my experience with the ORM zealots as well... SQL is already an abstraction, and it happens to be one of the most well thought out, battle tested abstractions we have in the software world... it drives me nuts when junior devs and devs with poor SQL abilities want to pollute a codebase with EF, Hibernate, etc. without understanding if it’s necessary or even helpful beyond simply allowing the developer to continue onward without spending time becoming proficient in SQL. That said, there are scenarios where an ORM is helpful, but I personally find those occasions to be the exception.


> SQL is already an abstraction

Still a leaky one, though. Sure, the syntax likes to pretend that you "just describe what you want", but in practice getting queries on large data sets to perform well involves layers of optimization, all the way down to deciding on how data is partitioned between storage backends and what the on-disk format of it and its indexes needs to be.

In fact I view SQL as a somewhat thin abstraction, with a bunch of unavoidable (or at least unavoided) holes. Maybe that's part of the reason it's been so successful.


Another way of saying it: database abstractions should be specific to your codebase, rather than use generalized patterns.


My bones to pick with ORMs are slathering them all over the place like too much mayo on a sandwich. They are very useful tools for some situations and more trouble than they are worth in others.

Where they really shine is writing/editing data, managing transactions, and migrations. Even for situations where I know I'm going to use the ORM very little, I'll model the database in SQLAlchemy, for example, so I can version the database with Alembic.

Reading data, I generally want more flexibility and don't want to have to push a code change to include another column here or there. For datagrid stuff, you develop a dynamic grid once for your app that just displays whatever the SQL spits out and then you can push changes by updating a stored proc. Reading data is also where a lot of complexity adds up quickly if it's directly from a normalized transactional database. And if you need to manipulate data that's already in your database, obviously, the best tool for that job is SQL, not an ORM.

Where a lot of the friction comes in is where people start off using an ORM and don't make any allowance for what's going to happen when you need to use SQL. Then using a little requires a ton of effort. Conversely, you have the same problem with projects that are designed to only use raw SQL, and then it's a big overhaul to use the ORM where it's genuinely useful, adds clarity and safety.

At this point, I just assume that every project is going to be a hybrid of the two, set up my boilerplate wiring and don't worry about it after that.


The argument against ORMs aren't that they are a leaky abstraction

From what you say, they are indeed a leaky abstraction. It's just that the conditions in which they leak are not easily accessible from the "Hello World" level demo. This is why the leaky analogy is good. It's not hard to make a boat that floats, period. It's hard to make a boat that will remain reliably watertight on the open ocean on rough seas.

yes all abstractions are at least somewhat leaky, so you are better to use simpler ones than complex ones

This is like the Russian philosophy towards military equipment. It's better that it's simple, robust, and can take a lot of dirt and abuse, because this will better in actual combat (or "production") conditions.


Dropping down to a lower level of abstraction from the get go because you might need that level of control for performance reasons in the future sounds like a typical case of premature optimization to me. Of course any sufficiently complex system is eventually going to need specifically optimized code at a lower level of abstraction in some places (usually then wrapped in an abstraction of your own)but that should always be done on a case by case basis as necessary.


In this case, I'm not sure I believe SQL is lower level than an ORM. SQL actually seems higher level than what most OO languages give you, and I wonder if that's a big part of the problem with ORMs.

Maybetthat's why the abstractions that ORMs give you generally seem more complex than SQL.


> Dropping down to a lower level of abstraction from the get go because you might need that level of control for performance reasons in the future sounds like a typical case of premature optimization to me.

On the flip side, I can make use of the code generation tools i have compiled (some myself, some others) and write code just as fast, if not faster, using raw sql and simple data structures than I did when I used ORMs.

There are infinitely many possibilities in between:

- A developer who is experienced in modeling and sql will deliver working software faster than a developer who might be experienced, but not with modeling and not familiar with a particular ORM or a particular version of that ORM. We could continue ad nauseam here.

The main point: if all things are equal, meaning you have a two developers, one comfortable and experienced in their tools of choice, the "raw sql and simple data modeler" developer and the developer very comfortable in their ORM of choice, the raw sql developer will deliver working software at the same time as the ORM developer, but will have a smaller surface area for buggy software due to the increased inadvertent complexity of using an ORM, and poorer database designs, since naturally, developers that rely on ORMs tend to have poorer or suboptimal database design (see this classic talk https://www.youtube.com/watch?v=uFLRc6y_O3s)

Again if the rebuttal to the database design point is "well you should still know SQL" then what point does the ORM serve?! To write less code? Anyone can write a code generator. If in order to be an effective ORM user I have to know the ORM and understand the SQL it generates to tweak the ORM flags or features to make it generate better SQL, then not only is it a leaky abstraction, it s a bad abstraction. I rely on my compiler to write assembly for me because I can't write assembly better than my compiler and I don't know x86_64 better than compiler developers. If I can't rely on my ORM to write good enough SQL, and I have to know the underlying things going on with the ORM, then why use the ORM? Again, "less code" by itself is not a sufficient answer.

As your database fills, the performance penalty for using the unoptimized mapped queries and the unoptimized data schema will start to increase by whole factors.


Lack of experience in this case being, "Haven't blamed problems on the same things as I have."

Stances like yours sit in defiance of so many thousands of people who've been able to extract value from an ORM. Maybe the issue you're having is specific to you, not the problem.

Are they perfect? No. Are your incredibly limited set of experiences universal? Also no.


so your generalization is, "people who understand SQL really well would never use an ORM unless forced by their job".

That is, you are claiming those who choose to use ORMs must lack experience. This is what you have observed, in your experience.

Can I therefore claim, since I know tons of SQL experts who use ORMs and even create new ones, that you "lack experience" as well? because your claim is wildly untrue.

edit: i apologize for the snark but you are literally claiming a whole segment of the programming userbase is less experienced than you are, based on their choice of tools, while you aren't considering that perhaps you haven't worked with many of these kinds of tools to know what's really out there and how different ORMs approach the problem.


I admit the comment was meant to be a bit tongue in cheek but it doesn't appear to have had that impact. My intention wasn't to be condescending or anything like that as others have taken it.

A better argument would have been to explain why ORMs are bad.

In my view: - they encourage bad data modeling practices. Eventually there will come a time where the way your ORM maps your data model to physical tables will be inefficient. Maybe you really have event based time series. The impedance mismatch becomes exacerbated.

- they are an extremely leaky abstraction. It is hard to think of a more widely used abstraction that leaks more. You have to be familiar with your ORM's SQL generation. Every single comment here has admitted that yes you eventually do have to drop down into raw sql, so this part isn't even subjective at this point. The article is about leaky abstractions, ORMs are the poster child for them. - On the other side of things, there are simpler abstractions between SQL and ORM that provide more value comparatively for the leakage, and my argument was you would be better off having simple leaky abstractions than one massive behemoth of an abstraction that leaks more than a waterfall.

- Really hammering the point home here on the leakiness, for being such a leaky abstraction, after it is all said and done, in my view they really don't provide you with all that much value, given that you have extensive experience working without them (the source of my seemingly condescending point about experience and some languages guide you to never working without them).

I can just as easily use my code generation tools to have "less code to write", and map some functions to SQL statements using my language's favorite DB drivers, use a battle tested and proven SQL migration and schema tool, and I can do this just as fast as any experienced ORM developer.

In this view, just like we have had the movement away from big massive monolithic frameworks to smaller opinionated sets of libraries stitched together with simple glue code, your ORM is the massive monolith in this case, and your simpler tools like code gen and migration tools are the smaller libraries.


Unless ORMs are 1:1 compatible with SQL features, some developers will inevitably be required to write SQL by hand.

Lots of developers are happy to leverage stored procedures and let the database do heavy lifting to reduce network traffic (or whatever reason). Others prefer the database to simply be persistent storage for data structures and leave the logic to their Java/C++/PHP code. One of those camps will never be able to use ORMs, and the other may live by them.

It stands to reason that more advanced SQL developers will leverage more advanced SQL features, some of which aren't available in ORM implementations.


But that doesn't matter, you still can use tools to help compose the 90% of SQL that is quite boring

Truth be told I think it is the "i write 100% of my SQL by hand" crowd that is still quite junior in their SQL experience. How many times would they like to write a rote INSERT statement over and over against their data records before realizing an abstraction would save them lots of typing and redundancy?

Edit: also note that ORMs most certainly can be used with stored procedure architectures in theory. I'm not sure if such tools exist but if I had to use SPs, I'd be writing tools to both generate the SPs from a given set of models as well as to do the runtime work of marshaling my objects to and from these SPs from cursor.callproc() in the same way an ORM does it to plain SELECT / INSERT / UPDATE / DELETE statements from cursor.execute(). Additionally, systems like PostgreSQL's pluggable SP languages, which includes Python, might even allow SQLAlchemy models to work inside a stored procedure, something I've always wanted to try. Both concepts are things I'd gladly pursue if someone wanted to pay me a few years' salary.


> It stands to reason that more advanced SQL developers will leverage more advanced SQL features,

I'm one of those advanced SQL developers, and an ORM has never gotten in my way. Don't blame the tool for the users lack of experience.


I'm not blaming anyone, I'm just pointing out that ORMs aren't going to have every SQL feature. Django didn't add windowing functions until rather recently. So while they are great for most use cases, but you're going to have to drop down to SQL if you want to use advanced features which are not available in your ORM of choice.


> I always find the people who don't believe writing raw SQL is preferable to using an ORM is usually down to a lack of experience

> The ORMers, as I call them, don't have confidence in their own SQL ability

SQL is leaky abstraction too. Every time you look at query plans and try to tweak the right amount and kind of indices, you are having to deal with the drawbacks of the SQL abstraction. That does not make people who do that "don't have confidence in their own file-system ability".


Some abstractions are basically perfect and don't require you to understand the layer beneath them at all. Some have that as their ideal, but are imperfect, and sometimes require you to understand a few details of the layer below. But others - the best example that I know of is SCSS - aren't even trying to eliminate the need to understand the layer below. Everything you need to know to write CSS, you still need to know to write SCSS, and then SCSS adds more on top. It's a convenient tool for power users and I wouldn't want to be without it, but for a beginner, using SCSS instead of CSS means you have strictly more to learn; I'd suggest that a newbie on my team not touch SCSS until they've done a few days' work with raw CSS.

While less clear-cut than SCSS/CSS, it seems to me that ORMs/SQL have a similar relationship. Frankly, without knowing SQL, you can't hope to competently use an ORM; you won't know what sort of queries and updates it's capable of or what its performance characteristics will be, let alone more subtle stuff like what indexes to create or how the database's transaction model works. But even once you DO understand the SQL layer, getting to grips with how ORMs work is a major additional hurdle.

For that reason, while I am comfortable using an ORM for projects I work on, I wouldn't recommend that a beginner do the same. If I did, I'd be doubling the amount they have to learn, and creating a risk that they'll give up in despair trying to figure out how something poorly-documented works at the ORM layer because they don't have the knowledge of the underlying SQL layer to intuitively guess what their ORM code must be doing under the hood.

It's perhaps more radical than anything I believe, but I think you can reasonably go further and make the case that the extra learning curve imposed by ORMs is not worth the minor convenience benefit they offer to the proficient, and that for that reason, it's generally better not to use them. The fact that they leak the entire layer underneath them is part of that argument, but not all of it; the other part is that the layer they build on top of that is difficult-to-learn and only adds a small bit of convenience in exchange.


Some abstractions don't seem to give anything useful at all other than homogeneity across code bases and slightly nicer looking (but not simpler) code, and yet they still see wide use.


If "nicer looking" aids readability and more understanding of what the domain logic is trying to achieve, that's a pretty substantial win in my book.

One advantage of homogeneity is that it allows you to not have to think as much about how something was implemented, and focus on why that code exists and what it's trying to accomplish at a high level.


Some abstractions don't seem to give anything useful at all other than homogeneity across code bases and slightly nicer looking (but not simpler) code

This is of tremendous value to big companies. This lets them move to newer languages and platforms more easily when the time comes.


For sure. It's also very good for onboarding (and better utilising cheaper/less knoweledgeable devs by constraining them, but people don't like to admit that).

The flip side is that abstractions (usually in the form of external dependencies) can only add complexity to the software, which has a detrimental effect on development effort. These "pointless" (but not really) dependencies we're talking about are the same as useful ones, but instead of weighing the cost against a more direct benefit (whatever the library gives), you're weighing it against these more indirect benefits, which is much harder.

In my experience most people just give up and go with whatever sounds good, which is usually "use whatever people are familiar with". That subset of tools grows and grows over time, and if you just always go with the defaults, eventually you're going to end up stuck in the mud, because nobody is weighing the cost of your abstractions.


These "pointless" (but not really) dependencies we're talking about are the same as useful ones, but instead of weighing the cost against a more direct benefit (whatever the library gives), you're weighing it against these more indirect benefits, which is much harder.

Facades and the various "layer" style abstractions aren't pointless. The whole point is to add a layer of indirection, for the benefit of being able to switch out underlying dependencies. The whole point is to make it easy to migrate off of a dependency.


You can't really dismiss abstractions anyways, because everything is an abstraction. SQL is most definitely an abstraction, even machine code and assembly are abstractions. Abstractions aren't reality (the map is not the territory and all that), but they capture some aspect of reality, in some sense, and if that aspect happens to be the part you care about, they're useful. It's not about not using abstractions, it's about choosing the correct abstraction for the given problem domain.


Abstractions should also simplify.

If an abstraction makes something unnecessarily complex, then using abstraction for the sake of abstraction isn't useful.

In the spirit of YAGNI, you shouldn't try planning too far ahead with your abstractions. From my experience, sometimes your clever planning all falls into place, but most the time something significant shifts in client demands and your abstractions don't make much sense anymore and add nothing but a layer of complexity to deal with that didn't need to be there to begin with.


> As an example, it's pretty common to hear that people should prefer writing raw SQL to using an ORM or query builder because those are both "leaky abstractions"

The takeaway of ORMs being leaky abstractions is that they should expose the lower abstraction. An ORM builds the query for you 90% of the time, but when (not if) you need to write raw SQL, the ORM should be able to integrate the raw SQL output in its flow.

An example of an ORM that does this beautifully is SQLAlchemy [https://www.sqlalchemy.org/]


The main problem with ORM is not that they are a leaky abstraction. It's that for any minimally complex problem they are a bad abstraction.

Abstractions usually empower their user, ORMs remove power.


To be fair, most database accesses many people are going to be doing aren't even minimally complex and having an ORM saves you a lot of tedious, boring, potentially error-prone work in those cases.

This shouldn't stop developers from actually writing a SQL query if the task at hand warrants it and using an ORM doesn't usually stop you from doing that.


Good abstractions allow themselves to get out of the way when necessary. There are for sure bad examples of abstractions, but there's also plenty of ORMs that let you bypass the abstraction completely when needed (or use the parts of it still applicable).

I also don't really understand your last statement. Abstractions, pretty much by definition, never allow you to do more than the underlying technology they're abstracting. They necessarily limit power in order to make code more comprehensible and maintainable. That's true for any abstraction I can think of, leaky or not.


In addition to getting out od the way I prefer when an ORM allows me to extend it to build my own abstractions.

For example in Rails when I have a complicated query, I can make write some SQL, stick it in a view, and make a view backed model.

Then I can keep using Active Record to make it easier use the query in multiple places, easily compose additional queries etc...


> Abstractions usually empower their user, ORMs remove power.

That is an opinion. One could also consider that the fact that you can use programming language to model your database provides a lot of expressive power.


They (and maybe you?) are missing the point of the Law of Leaky Abstractions here.

The point is that no matter how good your abstraction there will be corner cases where it gets in the way. This is also a great example of the perfect being an enemy of the good. Some people will try to get that last gap out of their abstraction to the point of it then becoming a little bit worse in all cases and still not being perfect.

The point is that your abstraction shouldn't get in your way. It should be easy to bypass when you need to.

The problem with ORMs in particular is that this tends to be impossible. Any ORM I've used has added a caching layer so then you can't run raw SQL queries and the ORM is never as expressive enough as raw SQL. Sometimes you just want a subset of columns but ORMs don't tend to support that since a table is mapped to an object and it's either all or nothing.

Years ago, this is why I eschewed the likes of Hibernate and preferred things like iBatis.


> Sometimes you just want a subset of columns

There isn't a mature ORM I know of that doesn't support this, as either partial bindings to the mapped object (with empty properties) or whatever k/v primitive is supported by the language.


Indeed software engineering pretty much consists of abstractions. (And computer science is basically the science of formalized abstraction, that's basically what CS is).

If we didn't have any abstractions at all.. we'd all be writing machine code in 1s and 0s. (and even that is an abstraction, of course...) That wouldn't get us very far.

People, including in this thread, also often talk about a particular abstraction being "a leaky abstraction". Implying some abstractions are, some aren't. Whereas in fact the 'law' quoted says that all "non-trivial abstractions, to some degree, are leaky." They're all leaky, the map is never the territory. But it's the business we're in. (But some abstractions are more useful, more successful, and/or more dangerous than others... in a given context of usage, always in a given context).


I agree all engineering is abstractions. Every function, data-structure, program.

>> [People] talk about a particular abstraction being "a leaky abstraction". Implying some abstractions are, some aren't.

There are degrees of leakiness though, which matter a great deal. The concept of a "file" is an abstraction, instead we could all write direct to disk sectors. But its value outweighs the leak associated with it.

Other abstractions aren't always so useful, sometimes a coworker makes a class that's riddled with exceptions and caveats.


Some abstractions are definitely more useful/successful/robust than others.

(I think I want to say usually for a particular context, not as a universal/existential thing. Sometimes that context grows to be everyone's... we all use a 'file' abstraction because it was so useful the entire field adopted it, and stopped considering alternative abstractions, and everyone takes as a starting assumption to work within the context where 'file' is useful (except in extreme edge cases maybe) -- for better or worse.)

Deciding which is which, whether a particular abstraction's value outweighs the "leak" in particular contexts or in "general" -- well, that's what endless internet debate is made of. It's usually not clear as a consensus until quite some time has passed (and even then heretics can challenge it).

I agree it's the right question, but it can't be answered simply by saying "leaky abstraction".


Really I think the point is to know what you are abstracting for and not simply dogmatism abstraction for the sake of abstraction or "purity".

If you expect to have a system which handles the low level details that is fine. If you don't want it to possible to mess up state consistency at the cost of possible performance fine.

However, if you expect abstraction to mean latency doesn't vary at all regardless of if it is on chip cache, RAM, hard drive you are going to have a bad time unless you are dealing with a library that specifically tries to mask timing attacks.


To agree on this inside of a broader context: I would argue that there are so many abstractions in language (“good”, “evil”, “god”, even less lofty things such as “color”, “heat”, “weather”), science ("gravity" would probably an arguable but interesting example as well as one that came up recently with a neighbor of mine), and even basic functional things (including “plumbing”, “electricity”, and even “computers” at quite literally every level of understanding above mathematics), so in keeping all of that in mind possibly our entire sense of being and living in the world would be completely debilitating without making use of an innumerable amount of abstractions.

That said there are always valid arguments against the use of particular abstractions, as they can be seen in practice as being largely more detrimental that helpful. Many make this argument against "god". My High school physics teacher made the argument against the use of “centripetal force.” I myself like to remind people that they probably know far less about the world around them than they think simply because that have replaced conceptual knowledge with a definition and categorical knowledge.

Though at the end of the day, I feel that abstractions in general are quite harmless, as long as you are aware of the fact that they are just that “abstractions.”


SQL is itself a high-level abstraction, and I have seen cases where an ill-conceived data abstraction layer has just added complexity without improving the quality of abstraction (I have also seen many cases where poorly-designed database schema have nullified the relational model's power of abstraction.)

With regard to your question, I want to be able to see both the abstraction and the implementation, as, depending on what I am doing at any given time, I might need to understand either.


Abstractions are also great to create sections of testable code separate from other parts harder to test (database comm, requests, IO...)

But they can also get in the way when they are too coupled with the code or when the layers of abstractions get out of hand. "the only problem that more abstraction can't solve is too much abstraction"

On a side note on orm leaky abstractions, that was particularly clear to me when using cakephp. It was really hard to debug what was going on in the background and getting the actual query being run with its placeholders in place is a nightmare. SQLalchemy in contrasts is extremely close to the actual query and with code you can find online you can actually generate the sql string and it is always very close to what I had in mind in the first place, thus properly integrating the extra expressivity python has into building sql queries.


I would rather see the first one. I'm no expert on software development, however I've seen that as time progresses new requirements are added. Cross-cutting requirements are specially hard to implement on your second option, imagine a request id has to be sent. My 2c anyway


I think `fetch` is the real leaky abstraction.


A fun example from SQL Server that bit me the other day. Observe this simple proc:

  create procedure SearchJackets
    @buttonCount int
  as
  select *
  from   jackets
  where  @buttonCount is null or buttonCount = @buttonCount

It's a common pattern you use for multi-param search queries that lets you avoid building your query as a string. Normally there are lots of little filter params in there, but we'll just show one for now. "Computer: List me the jackets, and maybe just show the ones with a certain number of buttons."

That version takes 60 seconds to run, whereas this version takes less than a second:

  create procedure SearchJackets
    @buttonCount int
  as
  declare @buttonCountCopy int
  set @buttonCountCopy = @buttonCount 
  select *
  from   jackets
  where  @buttonCountCopy is null or buttonCount = @buttonCountCopy 

... because it kicks the optimizer upside the head and convinces it to reconfigure the query plan in an efficient way. There are hints you can use at CREATE time that are supposed to do the same thing, but they don't actually work in this particular case. So now one of my codebases has this hack sprinkled about in a couple "Advanced Search" pages.

Fun stuff.


>There are hints you can use at CREATE time that are supposed to do the same thing, but they don't actually work in this particular case.

Are you saying that "option(recompile)" doesn't work for you in this case? It works just fine for me when using that 'X is null or' pattern for optional filters.


Indeed, that's the option I was testing that had no effect. From what I've read, it does seem to work in some cases. But not mine, sadly.


Well that's interesting. I'll have to keep your little hack in the back of my mind. Thanks!


Annoying that that should be necessary, but good to know! Out of curiosity, what does that particular inefficiency look like in the execution plan? Does it do a full table scan when @buttonCount is passed in null, or was it inefficient for any value?


In your second proc, should

     set @buttonCountCopy = buttonCount
be

     set @buttonCountCopy = @buttonCount 
?


Yes. Fixed.


"for a magnetic drive, reading data sequentially will be significantly faster than random access (due to increased overhead of page faults),"

...err... no.

Magnetic drives have slow random access due to seek-time, i.e the time taken for the head and disk to physically change position.

In comparison to that, SSDs are effectively zero latency but they still have read-ahead/buffering/caching latencys to deal with.


The quoted text appears to be comparing the sequential and random-access speeds of magnetic disks, so the differences with respect to SSDs does not come into it. On the other hand, I do not understand what the author means in the following clause, where the slower random access is attributed to the overhead of page faults, unless the author has in mind a specific (and unmentioned) scenario involving memory-mapped access (and if page faults are the issue in that scenario when using magnetic disks, why would one not have the same issue when using SSDs? I would have thought the causality goes the other way: the overhead of page faults is higher when using magnetic disks because of their relatively slow random access.)


The difference is that magnetic disks have a reading head that must physically move (slow) so it's much faster to access data under the current head position. I guess access time is roughly the same for any data in an SSD.


Possibly... All I can surmise is that the OP may think disks are addressed in a memory-mapped fashion and hence may be subject to page-faults for some reason.

(Obviously, they're not).


Reading that generously and since we’re talking about abstractions... reading from disk via mmap does work via page faults! Except... it’s a layer up. Doing random reads on an mmap’d file will likely have terrible performance until those pages have been cached, but one layer down there’s no guarantee that sequential reads from an mmap’d file are going to be sequential reads from disk! (Because the file isn’t guaranteed to be laid out sequentially on disk)

Others in this discussion have talked about some abstractions being perfect and a consumer not needing to understand the layer beneath; I strongly disagree. Ultimately, the physical reality of the machine will come into play (disk, RAM, caches, network, CPU etc), and I am generally uncomfortable if I don’t have a solid feel for how the high-level operations in an abstraction are going to use those resources.


I feel "abstraction" (as a term) is thrown around a lot, but should only apply to the conceptual system (aka, while in the computer science realm).

Once you're dealing w/ the implementation and it turns into something concrete (aka the engineering realm), it's just a layer of indirection (and it serves its purpose by reducing coupling), but you still have to consider that underneath all you're talking to some machine, over the network, there's latency, it can timeout, there could be a load balance in between, the cache may not be coherent for consecutive requests, you assume some operation is both atomic and produces instantaneous side-effects, and so on... if you ignore these details you end up w/ a system that looks great at a conceptual level but is full of problems and race conditions because you ignored physics.

TL;DR "Leaky abstraction" is pretty much a tautology if you look up the meaning of "abstraction"?


Re: Modern practices like 'Microservice Architecture' can be thought of as an application of this law (The Unix Philosophy), where services are small, focused and do one specific thing, allowing complex behaviour to be composed from simple building blocks.

An oop class or API can do the same thing. Microservices are overkill unless multiple applications will be sharing the service, and even then may have unnecessary overhead compared to say stored procedures.


   However, for a magnetic drive, reading data sequentially will be significantly faster than random access (due to increased overhead of page faults), but for an SSD drive, this overhead will not be present. 
Even SSDs have overhead for random access although not as much as spinning disks (~3 times IIRC)


Even RAM has overhead for random access. It takes time to change which row of the memory you're accessing, and there's also a "burst mode" that reads or writes a batch of consecutive locations with reduced overhead.

[EDITED to add:] And of course cacheing means that the RAM+cache system collectively makes random accesses slower, and would do even if all accesses to RAM took exactly the same amount of time.


I never understood how the myth that SSDs have no random access overhead became so prevalent and oft-repeated. Did nobody ever measure?


SSDs have effectively zero random read access overhead when compared to traditional drives, because the overhead is a couple of orders of magnitude smaller. Also for SATA connected SSDs the effect of this read latency is reduced by bottlenecks elsewhere.

For the common home/office/other user the difference between zero and effectively zero is, well, effectively zero, so the two easily conflate. It isn't so much a myth as a convenient simplification.

(The fact that there is still latency is very easy to show though - just throw something like crystaldiskmark at an SSD and show the measured throughput difference between the sequential and random tests.)

For NVMe drives where the bottlenecks of SATA are removed the difference starts to become more noticeable, and on any SSD random write latency is more significant than random read latency, but NVMe has only recently become common for the general user and the tests that people usually look at are random read not random write as for those common users this is the most significant measure in terms of how it will affect their day-to-day use patterns.


You could say it's a leaky abstraction...


Nowadays there are NVMe SSDs with significantly less random access overhead, asymptotically approaching no overhead at all as I understand.


I think the key is to make sure all leaks happen in the sphere of performance, not correctness. So you can use the abstraction on its own without fragility, and then you can learn more about it if you want to tighten up performance.


A better formulation: Abstractions do not adequately describe the full working of a system. So higher-level approaches will always require a knowledge of lower-level operations.

However, the opposing and terribly named "Dependency Inversion Principle" is true much more often: "High level [approaches] should not be dependent on low-level implementations."


Maybe https://github.com/denysdovhan/wtfjs counts as leaky abstractions.

It's definitely something that's difficult with building a language. Abstracting and generalizing your language parser can lead to amigious behavior.


Article could use a few more examples.


Yes, it would be nice to see something other than speed issues and slow networks.

I think encryption can be one. If you don't know the internals and handle it as a black box, you can lose security. For example if you encrypt a string twice with a one time pad, you get back the original.

Another could be presenting content. For example saving content as html or pdf. Pdf has a concept of pages and you may have aesthetically suboptimal pagination if you try to abstract away the final format. You may even need to cut a sentence from the content to have optimal visuals.

Or if you tried to consider all image formats equally and autosaved to JPEG every minute while editing, the lossy compression would degrade quality.

The fix is usually exposing the particular aspect on the interface. The problem is that usually new things that we think should fit some abstraction turn out to have peculiarities we didn't think of when designing the abstraction. We thought our abstraction is more general and future-proof than it is.


Joel's article ( https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-a... ) has tons.

A classic trap for new players is that in most modern compiled languages you can add strings and get a string, but strings are in fact immutable and can't be added without making an entirely new string and disposing of the original two. This means "a" + "b" is actually a horrible way to build up a string if you have to do lots of little additions, so most languages have some other method of making a string of strings/chars (StringBuilder in Java and C#, strings.Builder in Go, etc).


I have a counterargument for you: https://www.researchgate.net/publication/221200232_Algorithm...

It's variant called rewrite rules is used in Haskell's compiler for ages and is a heart of good vector algorithms (on par with C, including SIMD C).

I can't find a paper I read in 1998 or so where successive calls to fputc were replaced with fputs and with some other rules and same approach the OpenGL code was optimized to be as fast as possible.

It is pity that research that is twenty years old was not put into C# compiler.

And for a bonus, look at program distillation: http://meta2012.pereslavl.ru/papers/2012_Jones_Hamilton__Sup...

They transform quadratic concatenation algorithm (O(n^2)) into linear one (O(n)).


> most modern compiled languages you can add strings and get a string, but strings are in fact immutable and can't be added without making an entirely new string and disposing of the original two.

In my mind, I add two strings like so:

https://gist.github.com/seisvelas/c11d200d0040a3686e47af0068...

That is, I realloc the first string to fit the second inside of it, then add the second string into the new space. But I have no idea what I'm doing. I'm posting this comment so someone can explain to me why my way is bad and why I should be creating a new string to put the others inside (which apparently is what everyone else does!)


Why are strings made immutable by default in some langs (I have seen this mainly in java and python)? Nothing fundamentally requires strings to be so. Has some analysis been done indicating most string operations in software would benefit by immutable form rather than non-immutable form?


At least in Python, string objects are widely used as keys to dictionaries or as options in functions. For speed and efficiency these small strings are "interned" so that there is only ever one instance of the same string.

Also for mutable strings you either have to allocate enough memory to fit the final result or have some kind of rope data structure. Or else you end up copying it anyway.


Great list of programmer laws, ideas, and understandings, not just "The Law Of Leaky Abstractions" (which of course is a classic in its own right...)


It's like the 2nd law of thermodynamics. A surjective epimorphism. Information is lost as you go higher and higher into the abstractions.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: