"ORM is fine" 90% or even 100% of the time don't make them right. Thee problem is that OOP is not well defined, it belongs to "soft" science, and relational databases are "hard" science. I have read the article about ORM and Vietnam, in we just have all of the symptoms of impedance mismatch.
I have to add that this issue is much deeper than a simple efficiency issue. A perfect db with 100% availability would still not be well matched to an object system.
I listened to a podcast from 1990, the guy said "the problem with objects is that we don't know what they are". I just finished SICP, and it seems it is the same conclusion: objects with their state, their messages, their multiple parents, their instanciation, are not a proper model for many or most cases in software development.
Example of proper models include types, text files, streams (pipes), modules, maybe services.
Set theory is hard science. On the other hand, a grapevine of stored procedures with no code reuse beyond copy-paste aren't any kind of science. I'm not even sure they are any kind of engineering.
You are raising an interesting issue here: does DRY principle apply in db dev. Not sure. Db dev its closer to configuration. Anyway I won't advocate too put all the logic in the database, so I won't get a grape wine, maybe just a fruit plateau, with some duplication but not to much.
Related to your point would be an argument that software engineering is more productive when treated as a "hard" science (which I take to mean something like "rigorous justification of all steps" or "clear derivation from principle").
I don't think that's true at all, as evidenced by pretty much every successful software product ever (including the early relational databases, I should add) but would be curious to see the counterargument.
Software is a complex beast with multiple aspects. The UI design can Bree taken seriously but is no hard science. Many interesting parts just can't relate to hard science, even where you have algorithms. But when handling facts data you have the possibility to ground yourself on hard science, and should. It would be a stupid crime to build a tower without grounding what can be on maths and physics, no?
> ORMs are powerful, because they let you say less and do more
To me this is a bonus. The real deal is that ORMs allow you to write queries as "first-class" components of the language, hence benefiting from language features such as type checks, duck-typing, factoring, static analysis even, and more.
Compare this to stitching strings (however parametrized they are) and manually coercing your object values to strings.
This is where, IMHO, ORMs like ActiveRecord and Arel shine, as they give you access to each building block (form connection.execute to .quoted_table_name to .to_sql) so that you can place yourself at whatever level of abstraction between the two worlds you may need.
I agree that the 'divide' is overstated. But the author does mention one frequent problem: when you begin a project in an ORM, you may begin to depend on validation or update callbacks that get triggered by the ORM. Later, if you discover queries that need to be written manually, you will first likely forget these rules, then have a hard time translating them into SQL, and then have an enormous, permanent headache keeping the two in sync.
The divide is fundamentally about choosing what's in charge of your system, the system being composed of your databases, your applications, and your supporting infrastructure (your scripts, your migrations, etc.) To relational database folk such as myself, the central authority is the database, and our principal interests are what ACID exists to provide: concurrent, isolated, atomic transactions that cannot be lost, on top of a well-defined schema with strong data validity guarantees. To us, what's most important is the data, so everything else must serve that end: the data must always be valid and meaningful and flexible to query.
The side that argues for ORM has chosen the application, the codebase, to be in charge. The central authority is the code because all the data must ultimately enter or exit through the code, and the code has more flexible abstractions and better reuse characteristics.
The reason for the disagreement comes down to disagreement about what a database is about. To the OO programmer, strong validation is part of the behavior of the objects in a system: the objects are data and behavior, so they should know what makes them valid. So the OO perspective is that the objects are reality and the database is just the persistence mechanism. It doesn't matter much to the programmer how the data is stored, it's that the data is stored, and it just happens that nowadays we use relational databases. This is the perspective that sees SQL is this annoying middle layer between the storage and the objects.
To the relational database person, the database is what is real, and the objects are mostly irrelevant. We want the database to enforce validity because there will always wind up being tools outside the OO library that need to access the database and we don't want those tools to screw up the data. To us, screwing up the data is far worse than making development a little less convenient. We see SQL not as primarily a transport between the reality of the code and some kind of storage mechanism, but rather as a general purpose data restructuring tool. Most any page on most websites can be generated with just a small handful of queries if you know how to write them to properly filter, summarize and restructure the data. We see SQL as a tremendously powerful tool for everyday tasks, not as a burdensome way of inserting and retrieving records, and not as some kind of vehicle for performance optimization.
At the end of the day, we need both perspectives. If the code is tedious and unpleasant to write, it won't be written correctly. The code must be written--the database is not the appropriate thing to be running a web server and servicing clients directly. OOP is still the dominant programming methodology, and for good reasons, but encapsulation stands at odds with proper database design. But people who ignore data validity are eventually bitten by consistency problems. OODBs have failed to take off for a variety of reasons, but one that can't be easily discounted is that they are almost always tied to one or two languages, which makes it very hard to do the kind of scripting and reporting that invariably crop up with long-lived data. What starts out as application-specific data almost invariably becomes central to the organization with many clients written in many different languages and frameworks.
We're sort of destined to hate ORM, because the people who love databases aren't going to love ORM no matter what, and people who hate databases will resent how much effort they require to use properly.
This is a fantastic comment, you need to post this as a blog post, and then submit to HN. This way I and many others will never have to rehash this again, just point back to it.
> We're sort of destined to hate ORM, because the people who love databases aren't going to love ORM no matter what, and people who hate databases will resent how much effort they require to use properly.
Speak for yourself. I love databases (note the plural form) and love ORM. ORM is a godsend for developing application that has to work against different databases (postgresql, mssql, db2, etc).
To me it sounds like you're talking about the argument of who has the responsibility of applying business rules, the application layer or the database layer - which is another entirely valid argument in itself, but distinctly different than the argument of whether to use an ORM in your application layer or not.
To a database person, the database is all about business rules: what we recognize as a "thing", what we record about these things, and how they relate to other things. ie, what are the facts about the business, and how do we reason about them?
Any conformant client code then must honor these rules, and oftentimes that means it must re-implement them, which is an acceptable cost if we have decided to use an RBDMS in the first place.
Now it's true, a given database may only implement a subset of all applicable business rules--maybe some fall outside the scope of the database, maybe it's preferable to offload some to a trusted client, maybe the business and database model have drifted apart over time, and no one wan't to overhaul the database model due to all the dependencies involved.
That said, any rules that the database does implement is a good thing, especially those simple rules that can be implemented as constraints. And it's good because then you can program against them, from any client, from any code, inside the database and elsewhere, and you can make guarantees about what possible states the data could be in. This is generally a useful thing.
Agreed. If the costs were equal, I would always implement business rules in the database. I doubt many people would disagree with that. The problem is, it takes longer to write and debug. However, despite this, I think there is a certain subset of rules the belong in the database without exception - constraints and things of that nature.
Would Reddit be a good example of this, where they use a relational database as a key/value store, don't use an off-the-shelf ORM and still depend on the application layer for all the business rules?
I admit I have no statistics, but it's been my experience that most places choose between a highly OO model + ORM and a highly relational model without.
> it's been my experience that most places choose between a highly OO model + ORM and a highly relational model without.
My experience has always been a highly relational model, ORM or not, and business rules enforced in app layer or DB (or a mix of the two). I've always seen them as distinctly different decisions.
Personally, in the past I was always a "rules in the app layer" guy, because of the many advantages of doing in that way, but as I get older the more difficult but guaranteed correctness of implementing in the database is becoming more appealing (especially if it's not me that has to actually write the code!!)
Is it Man or Woman important for successful marriage? Your comment sounds similar to this :) I do not hate either. ORM or not, developers dealing with db data needs to understand how it works. If they do not, they may still come up with working application but when performance issues come up, they become deer in headlights.The same argument applies to ORM or any other technology. Long time Hibernate user here.For me, Martin Fowler's article hits the nail.
Let me pitch you this scenario. You run Facebook, and you have all the software and all the data. A catastrophe occurs and you lose everything, and due to a mistake in the way backups were made, you can choose to restore the software or the data, but not both. (Somehow, restoring the software will destroy the data, and restoring the data will destroy the software). Which one do you choose?
Although that makes for an interesting conversation, it is a red herring / strawman / false dichotomy with respect to this discussion.
> To us, what's most important is the data, so everything else must serve that end
To you, yes, and I don't fault you for defending that perspective. But the real master who must be served is maximizing "profitability" while maintaining an acceptable level of risk.
Anyone from either side of this argument who ignores the very real advantages from the other side, or the risks from their own side, are the only ones who are totally wrong. (Which would make the author of the original article the one that is most "wrong" in this discussion, as far as I'm concerned.)
I don't disagree with you. The point of the hypothetical situation is to dislodge a naive programmer from the sense that the code is the most important artifact and the database is just storage, a servant. I agree with you and jshen, that reality is nuanced and turns on the business, the system as a whole.
"To us, what's most important is the data, so everything else must serve that end"
This is ideological, right? What's most important is the business. Anyone that starts from the assumption that everything, EVERYTHING, must serve the end of the data, is wrong. Right?
We can make up interesting dilemas all day. How about this one. There is an optimization that facebook can make which is shown to increase monetization by 10%, but it creates soem risk of data corruption. Engineers estimate that it will corrupt 0.01% of facebook posts. Do you choose a 10% increase in monetization, or does everything have to serve the end of data integrity?
Yes, it is ideological. But you're right, at the end of the day, it's the business that matters. Developers are going to have guiding principles and it's not a bad idea to evaluate them every now and then; I hope my hypothetical at least illustrates why the alternative opinion exists. I quite like your hypothetical situation, it's the best retort, and offers a great motivating example for the rise of NoSQL, where ACID compliance simply doesn't make as much business sense as scalability with less validity.
"I don't believe in hypothetical situations" -- Kenneth the Page, 30 Rock.
Your response is long, but altogether too subjective, philosophical, and tautological.
What matters is consistency, usability, and agility. Throwing ORMs out the window will give you as much consistency as you can squeeze out of an SQL server, but will greatly reduce your agility. Using an ORM for everything will greatly increase your agility but will reduce your usability.
As in everything, there is a balance. People who fall on either side of that balance need to back away from the pulpit and rethink their stance.
But the truth is the days of DB being king are almost gone.
These days you just don't hear about DBAs at all any more. You used to see constant jokes about DBAs being a pain in the ass and stopping programmers doing X or Y. ORMs going to win because there aren't enough of you left. Stored procedures, triggers, etc. are going to be viewed as ancient technology back from the days of yore when people didn't understand how to code properly.
At the risk of responding to a troll, being a DBA is far more than "Stored procedures, triggers, etc.". The relational database is still around (I notice you talk about databases as if all databases are relational, which is false), and will remain for decades to come because relational theory is sound and has proven to work for most use-cases.
The database is where you store your data. If you have data of which its integrity is critical to your organization, a properly designed and maintained database is going to save a lot of hastle.
I believe that databases will remain important, and maintaining data will always involve restrictions on how you can use it. Restricting data is not a relational database problem - it's more often than not a business constraint. Often times you don't want programmers doing stupid things with your data :-)
My experience seems to be different from yours. I am a working DBA and I have friends that are working DBAs and we generally do not have a hard time finding work.
I incidentally have stopped programmers from doing X or Y, but it was because the right answer was Z.
As for ORM's winning, I don't think its a war, For some things I use and recommend ORMs, but for others I recommend using pure SQL.
> Stored procedures, triggers, etc. are going to be viewed as ancient technology back from the days of yore when people didn't understand how to code properly.
You may be right about perception, but nearly every system I've worked has contained a big ugly mess somewhere because the author didn't know how to use a SQL DB properly.
I would be interested to know what you mean by "real data store", and what you believe Facebook does badly at in terms of handling data and caching and how it could be improved.
(I am an engineer/developer/whatever at Facebook, and I'm always interested in hearing the perception of the company's technology from the community.)
1. I've always been under the impression that for what Facebook does, a traditional RDBMS simply cannot handle the scale (like, not even close). Is this correct?
2. I'm also under the impression that due to the architecture Facebook runs on, from time to time some lesser-important data (ie: a status update or comment) can be lost (temporarily or permanently) and this is not considered unacceptable. (It seems perfectly reasonable to me for this particular use case.)
Perhaps it is best for the database team to talk about that themselves - wouldn't want to put words in their mouths. They gave a Tech Talk in December last year, which you can see at http://livestre.am/1aeeW
I'm running a sales reporting and payroll software written in Django, and it's more like 10% of views benefit from ORM and for 90% views ORM is hindrance and a serious performance hit.
I see ORM vs relational style as a development mindset issue. We have moved using SQLAlchemy Core (not the ORM part) and the difference in development style is stunning: now it's quite easy and more importantly FUN to write performant queries, whereas when we were using Django ORM, it was very easy to write non-performant code and tedious to make it fast.
One could pose a question that if you are not doing aggregation, why have SQL/relational database at all.
I know, there are several valid reasons to use SQL databases as key-value storages: they are likely more robust than NoSQL dbs, there are API wrappers for most of languages, etc.
But for many interesting problem domains reporting and data aggregation is raison d'être for software and that's why I would like to see more solutions in the spirit of SQLAlchemy Core that make it easier to restructure complex queries as a reusable pieces of code, but do not force to you to step into the world of ORMs.
IF ORM's stuck with the 80-90% case I don't think people would have problems with them. Unfortunately, they try to handle an ever wider range of edge cases which tends to create complexity and cruft. As soon as I need to double check their output and I see horrible gobblygook for a moderately complex query I am generally better off writing it myself than trying to understand how I can get the ORM to make a cleaner query.
Still, ORM's seem to be getting better Linq to Entity Framework seems to work much better than a lot of the old ORM's I have used.
ORMs don't try and handle anything. People apply them poorly which simply creates bad code.
The same can be said of the goto statement, or any number of other things - used improperly, they screw up everything, but in some instances they are necessary and good.
Actually, that's not really true. An ORM will almost always attempt to convert your object access to SQL. It's not always possible to make it work as well as a rewritten query. For instance, sometimes you need to use an EXISTS, or change a join order with a hint, or find a better way of reducing the driving table's rows to get better performance.
With an ORM, no matter how much tweaking you do, it is sometimes impossible to get the performance that a well written SQL query can achieve. And that's not because of bad code, or that the ORM is a bad choice.
BUT...those are problems that an ORM is not a good tool for. If someone is trying to use an ORM for that, that's the programmer's fault, not the tool's.
SQL is great for what it does, but I use ORMs for reasons other than writing queries in a different way. I inherited an utter mess of a schema that wasn't even remotely close to 1NF. Imagine fields containing comma-joined sets of values, and with column names not even remotely related to what they actually held. For legacy and business purposes, updating the schema was a non-starter.
So I used SQLAlchemy to remap the schema into something usable. I wrote getters that split out those comma-joined fields and returned the desired value against tables that required indexes like:
I wrote (and therefore more importantly _documented_) the bizarre and complex way some of the tables joined together. I wrote something that was unit-testable and that could be used as a foundation for other work so that I wouldn't have to memorize the insane corner cases and reproduce them from scratch each time I needed to access the data in some little-used table.
I _didn't_ write a more convenient way to say `SELECT * FROM blog`. In general, that doesn't interest me and I wouldn't have bothered with it. ORMs are great - if used well! - for encapsulating all of the little bits of business cruft in one central, easy-to-manage place. They're handy for roughly the same reasons that subroutines are handy.
This is an interesting use and I like the idea, but I don't think it makes him wrong. You don't actually need an ORM to do this. The same logic could be applied to data sets retrieved through regular queries. Or you could create views/stored procedures with the same logic.
We could be more subtle, and go with "it depends". For getting and putting data to be used on an edit form, using an object/class which actually has (non relational constraint) "business logic" in it, ORM good. For batch mass update, ORM bad. (one wonders if every table needs a custom class, though)
Batch updates aren't where it ends. As a Django developer with a strong SQL background I find my hands are tied far more often than I would like. Sometimes this is caused by bugs like the current group by bug[1], other times it's caused by the design of the ORM.
I do agree that the ORM is handy for things like "Get me all the things in this table", and "update this single record using a form", but this author is dead on. There is logical reason behind the hate developers have for ORMs.
Django's ORM is a piece of junk compared to a proper one such as SQLalchemy, hibernate or nhibernate. I wouldn't go drawing conclusions until you've experienced something else.
What in particular is it about my question that caused it to be voted down? I'm actually very interested in hearing the issues that the responder is having with Hibernate. Unfortunately, because they haven't stated what it is that caused them so many problems, it's of very limited value to the discussion being had.
I don't know about the django ORM, however for some other modern ORM's batch operations and statement reordering are one of the top ORM benefits. If you are developing a set of API's that you have no idea how they'll be consumed, the combination of declarative transactions and heuristically optimized ORM based reordered statements + batch updates gives you easy to understand code, proper transaction semantics and really good performance. Without the ORM + declarative transactions you may end up writing ugly apis that pass transactions or connections around and you'll end up having a major task of optimizing the database interactions. Learn your tool, use it for the right use cases and you'll be happy. Putting your left shoe on the right foot will always feel wrong.
Can anyone point out the coup de grâce the author seems to think he has arrived at?
He points out some (well known) ways that ORM's can be used inneficiently, and acknowledges the techniques that have been developed to work around these, but then seems to conclude that he has proven once and for all that ORM's are bad. I totally missed the connection on that part. Is it that SQL is better in dealing with sets than an ORM (a fact no one denies), therefore you should not use an ORM?
The connection (according to the author as far as I've understood him) is: "All the techniques that have been developed to work around the inefficiencies of ORMs are just reinventing the wheel. The solutions have been there for 40 years and your workarounds are only needed because you think about individuals (object-orientied) instead of sets (relational)."
> The solutions have been there for 40 years and your workarounds are only needed because you think about individuals (object-orientied) instead of sets (relational)
(This is to OP rather than you)
Well if that's the argument he's making, one example I can think of, in the case of an extremely complex update, while it always can be done in pure SQL, it is much easier to logically code using an ORM, perhaps even using individuals rather than sets (the horror). And while this implementation might execute slower (.1 second vs .01 second), it is vastly simpler to read and refactor without screwing something up (ie: economically cheaper), and as for the performance argument, it only needs to be fast enough.
The problem is, in the real world the performance divide between iterative and set-based solutions often spans much more than just one order of magnitude. For someone who's got solid experience in constructing set-based queries, the SQL solution can quite often be simpler to read and refactor as well. That's admittedly a big 'for', though. Good database folks reason about this stuff in fundamentally different ways, and there's a lot more to creating a good database person than simply teaching a programmer SQL.
In respects like that, ORM's greatest benefit is also its greatest downfall. Using an ORM means you can put people who don't have a strong grasp of databases in charge of your databases. Sadly, it also means that you've put people who don't have a strong grasp of databases in charge of your databases.
For that matter, only needing to be "fast enough" is fine if you're the only kid in the playground. That's often a safe assumption to make if you're writing app code, but less so with databases. If the database is being shared by a number of applications, or if the server is hosting multiple databases, or if you have to worry about concurrency, then being "fast enough" probably isn't enough. Because you've also got to think about all the other ways that your queries could be affecting everyone else, and making sure you aren't subjecting your server to the tragedy of the commons.
Which comes to another nice thing about having a dedicated database person. It means there's someone whose official bailiwick is the DBMS. If app A isn't experiencing any performance problems itself, but is causing performance problems for app B (say, because of some perverse locking situation), that's a bug that a DB guy is best positioned to diagnose and fix. If the application isn't too tightly coupled to the database (i.e., sprocs are in place) then he can even quietly fix it on the server side without having to hassle anyone about the application code. A team that's too ORM-reliant, on the other hand, risks failing to include anybody who's even well-equipped to recognize the problem, let alone fix it.
Which is creating a false dichotomy. Assuming your are using some OO language then you will have to, at some point, turn the relational data into objects. At that point you are always creating an ORM.
> Assuming your are using some OO language then you will have to, at some point, turn the relational data into objects. At that point you are always creating an ORM.
This is not true.
At some point, you will need to extract a subset of the relational data and represent it using your application's in-memory model. If your hand is not forced by the ORM, this is unlikely to be a direct mapping of the database.
This is no different than a network protocol, wherein the protocol is not a direct representation of application state, and the application does not attempt to model the network protocol using the same constructs that it uses to model its in-memory state.
I don't see how ORM forces your hand to be a direct mapping of the database. To me the benefit of ORM is that, most of the time I do want a direct mapping of the database, but there are often times when I need to add a layer of abstraction/validation/redirection on top of it.
The direct mapping of the database that it provides isn't really direct, and the impedance mismatch makes a mess of what could be a simple, easy to understand in-memory application model.
No, you haven't. If your in-memory application model mirrors your database model, you're probably doing it wrong; either your application model is poor, or your database model is.
I think his argument is basically the same as the argument that people who put down things like Redis use: You're just slowly recreating SQL. I don't think it's accurate, but that seems to be the argument.
Getting single row by primary key which is 90% of access is overly verbose in SQL so ORM wins.
For slightly more complicated cases SQL is much faster and easy to write so people who have to increment field in all rows that satisfy simple condition go: "ORM sucks".
But for more complicated cases like trimming data tree in some places SQL quickly becomes too much of a puzzle for most programmers to deal with so they prefer ORM again because it's doable there and most of the times works. Dedicated SQL users who are not good at puzzles in such cases write full fledged program (if their SQL dialect allows for that) and instead of bringing data to their iterative or recursive programs they bring their programs to the data which creates hard to debug, unreadable often unversionable monstrosities.
There should be some merge between databases and programming languages that could combine beauty of syntax of modern programming languages and efficiency of massive data handling of modern databases.
Why is it ok to have standard hashmap implementation in a language but not file backed hashmap or btree index?
There is a need to manipulate relational data from object oriented code. ORMs are tools that facilitate that. The Object/Relational impedance problem doesn't go away if you hand-carve the code, it just makes you work hard on all the points of contact instead of just the problematic ones.
The real "problem" with ORM is when people use such tools as a way of avoiding having to understand databases (and specifically SQL). Fortunately that's becoming less common at least within the Enterprise Java world where I live and breath.
I think you've hit on something here. I came to start using ORMs after 10+ years of writing / optimising databases and SQL. When using an ORM (and most of my work is done with a probably not-well-known one, XPO from DevExpress) I'm aware of what is (or should!) be happening under the hood; my prior experience with 'bare metal' is extremely useful, nay essential to creating a performant system. On occasion, it's necessary to do a direct SQL query, but not often.
Sure, the apps I'm writing are for small-medium business, but XPO 'just works'. Context is important; if I was working on something with more users / tighter speed requirements, an ORM may or may not be the best choice. Still, this falls into the 'right tool for the job' category that good developers are already aware of.
Any system I've worked in that didn't use an ORM still implements an ORM.
Save methods which runs insert or updates. GetAll methods get_by_username, get_by_id, get_by_email, get_recent, get_by_foreign_keyed_object, get_by_other_foreign_keyed_object.
And all those hand coded methods directly tied to the dialect of the db the original developer used.
Using Hibernate I had to write raw SQL for some reporting. I've never written raw SQL in Django (the project I've used django on self select for simplicity).
And using SqlAlchemy/Twisted as a backend for a desktop application I haven't found a need yet, for performance or correctness. I have one query I'm eyeing for an SQL rewrite, but it'll probably be a week of work to make sure it works correctly and I'd rather release this phase than save 30 seconds on a weekly query.
I've reached a point where ORM complaints don't really make all that much sense. The issue seems to be "THe ORM breaks down doing X and Y and Z so I had to hand write SQL!"
But you'd be writing X Y and Z anyway if you weren't using an ORM, so what's the issue?
Exactly. Or more explicitly, the anti-ORM argument, to me, consists of: ORM works fine for A through W, but SQL is better for X, Y, Z. So rather than just the common sense solution of writing just X,Y,Z in raw SQL, everything has to be written in raw SQL.
The argument works both ways; in fact, I think it's worse the other direction. Most projects use the ORM and just the ORM and it's against policy to write anything in raw SQL even if it's better for X, Y, Z.
The attitude is prevalent in the design of many ORMs. I'm both a huge advocate of ORMs and of SQL. A good ORM provides a simple and direct mapping from storage to the object model. But most ORMs go beyond that and try to cover all the query and performance possibilities available from SQL. Some ORMs have their own text-based query language!
I've met developers who can happily (and effectively) work with an ORM but hardly even know SQL! They certainly don't know SQL well enough to use it in the situations were it would be most effective.
I'm starting to feel like really effective set-based understanding of SQL is becoming sort of a lost art.
Any system I've worked with that didn't use SQL directly ended up implementing poorly a system for describing sets of data. SQL isn't a perfect implementation of relational algebra but it's better than a bunch of nested loops.
As is so often the case, IMHO the poster is ignoring the business side of the equation in favour of "correct". A good ORM is fantastic for speeding up development. If you design your db around being well usable with the ORM, the potential development gains (at least in dynamic languages like Python) are HUGE. I left Django in part because of their ORM. But using SQLAlchemy, we develop far, far, faster than if we were using SQL directly. And it's flexible enough to allow me to drop into the SQLAlchemy Query language when I need to hand tune a query, or to SQL itself if I really need to get close to the metal.
Will there be downsides one day? Probably. Will they come even close to the business value of the amount of the coding time we've saved during the critical bootstrap phase? No way in hell. As they say, those are problems I'd love to have.
This article is all about RBAR vs. set based operations. It has little to do with ORMs per se. That naive ORM users may tend toward RBAR is beside the point.
Go RBAR when it doesn't matter - when it's convenient and you know how it is going to scale ahead of time. A user updating his profile. Creating an order.
Go set based the rest of the time - when you're processing a large batch of data for thousands of user accounts, offload that work into a stored procedure and call it.
ORM does exactly what it is meant to do, and if you're working with data in an OO model, you're going to be doing ORM, whether you know it or not - you will either pull a decent ORM tool off the shelf, or you WILL be writing your own very bad one.
Row-By-Agonizing-Row, or iterating over a list of rows and operating on each individually with a new query.
Your average Java programmer is used to iterating over collections in a while loop, where 30,000 in-memory objects can be quickly modified. It's tempting for said programmer to do the same to ORM-backed objects and issue 30,000 sql update statements across the wire.
Well now I'm reminded of my initial though when a colleague first introduced me to an ORM: "It feels wrong to use this tool to just map every object to a table." Of course I went on to write many, at best, moderately complicated web apps very fast using NHibernate and didn't miss writing vender-specific SQL or column to property mapping boilerplate.
But this article is a breath of fresh air. I may just try Dapper for the next project
I hated every single ORM "system" I have ever come in contact with from many different languages with the exception of one: SQLAlchemy. SQLAlchemy isn't just easy and intuitive, it is really smart and well built.
For a long time I was on the side of the ORM haters, until I tried SQLAlchemy - it truly is an awesome ORM package and I have yet to find anything like it in PHP/Ruby/etc...
I think the other point the author makes is that it's not possible to write efficient code that is entirely abstracted from the underlying data (see his loop examples).
i.e. if you have to write your code in a specific way to make the ORM behave correctly (constantly thinking about what kind of sql your code is generating), then the abstraction becomes a lot less useful.
I'm well open to correction on this, since I've not much of a clue, but with all this ORM back-and-forth, and relational databases, why do we not see more usage of graph databases?[1] From the wiki, it says they map more directly to OO applications. Is there a reason relational databases are still used by default?
Inertia, accidents of history. Relational databases had obviously-useful properties at critical points in the history of mass commercial adoption of computing. As a practical, real-world matter, "object-oriented programming" barely existed during the rise of relational in the 70s, making the practical problems caused by the impedance mismatch either non-obvious or seemingly unimportant.
As a result, huge investments were made in advancing relational databases, and other forms have languished. Today, relational maintains the substantial advantages of maturity and installed base. Performance, reliability, general polish, ease of access to support/tutorials/other literature, and general status as "the way things are done".
As an author of an ORM I can agree with the OP that using them incorrectly can lead to horrible punishment on the database. He seems to be most concerned with the n+1 issue which happens when you loop through objects and another query is performed on each iteration.
If you're going to use an ORM then you absolutely have to learn the mechanics to avoid n+1 queries. Every ORM is a little different, some are easier to tune than others.
I personally favor a basic mapping that the ORM does automatically combined with an advanced mapping that allows you to basically write queries for special purposes and map them to transient objects that don't necessarily exist in your schema. You have to do this for things like aggregate queries or calls that require several joins but only need a couple of columns from each table.
The OP raises some good points but I do think that ORMs can be used properly to great effect.
Yet another software religious war, which seems to boil down to "if you don't think my technology is correct 100% of the time, you are insulting my honor".
Do other fields get into constant pissing matches like this? Languages, libraries, process, licenses, editors, Operating systems, you name it, software engineers are fighting about how much better theirs is and how you are an idiot for not seeing the true light their vast intelligence is trying to bequeath unto you.
What is it about our brains that makes the subtlety of "use the right tool; every problem isn't a nail" so difficult? Or is it just hard wired into our need to be identified with a community?
It's probably directly proportional to one's OCD level or amount of pain accumulated through the years. There's also the need to show and feel that you're superior, and sometimes the need to point out that somebody's superiority is irrelevant.
I think the problem with ORM is that at the end it is only a wrapper over SQL, say like Winzip is for Zip. So while it may look pretty and easy to use, it always has to obey the rules of the host program, and so workarounds such as these have to be invented.
On the other hand if somehow ORM was part of the core compiler AND database, then somehow it could be possible that even when you write a for loop on the top and have an if conditions inside the block, or perform a join, the compiler understands it and pre-compiles your code without needing such workarounds (as there aren't two separate layers to join). So you would treat objects and objects and never have to worry about how the wrapper is being generated or what kind of indexes or queries it will run finally. I'm not an expert at compilers though but for strictly typed languages it could be possible.
That would never work though, at least not in the perfect, non leaky way the author and you would like. When you have a loop, even if the database could somehow understand that loop, you can put anything inside it. You could make a call to an outside service, write to disk, etc. With a single SQL query the database can take it and optimize it since it has a full understanding of its data domain.
There's a problem that's inherent to any client/server architecture: what work should get done on the client, and what should get done on the server? The author presents this as an ORM issue, but it has nothing specifically to do with ORMs; you'd have the same problem using an OODB server.
I agree with a lot of what you he says, however what about the case of getting 10 rows with 20 columns - each column needing a join. The optimizers often go nuts over this sort of thing, as that's 20! combinations the optimizer must iterate over to get a good join order.
Apparently, Postgres has a genetic optimizer that handles this... curious to see if this is an issue or not.
Incidentally, I'd like to say that I loved the author's Relational Basics II at http://www.revision-zero.org/relational-basics-2 I've come to the conclusion that SQL is particularly limited in its application and implementation. I'd love to see a better declarative language for databases!
This is all very nice, except that users often do want to work on individual records (at least when updating them). Codd's simplifying assumption is for mathematicians, not users.
The idea of translating the declarative way of doing things to an imperative approach (that's basically what ORMs are doing) is imho a huge failure. It just never worked decently.
These days, we have languages which integrate rather nicely into the declarative mindset, so no need anymore for such bizarre "paradigm translators".
ORMs are powerful, because they let you say less and do more. For 90% of the queries out there, an ORM is fine.
SQL is powerful, because you can control and fine-tune your statements. For the remaining 10%, use SQL.
Are ORMs bad? No. Can you them for everything? No.
The same thing can be said for almost every technology in existence.