This was my position for a while. ORMs introduce a layer of magic which obscures what's actually going on under the hood. I decided I would just make raw SQL queries and handle mapping data explicitly.
I quickly ended up with a lot of duplicated code. So then I thought, "Well ok, I should add a bit of abstraction on top of this..." I started coding some simple functions to help map the tabular data to objects. One thing led to another and suddenly I looked at what I had done and said, "Wait a minute..."
There is a big difference between just writing helper functions to construct SQL and convert data types, and OO-style magical auto-persisted objects. The latter is what I don't like about ORMs but the former is fine. I feel that this is an important distinction to make.
As an example, the sqlalchemy docs[0] make this very clear: there's an ORM, but there's also just a core expression library that simply helps you connect to the database and construct queries.
I've never been a fan of codegen, but I think I could get past it for this library - it looks great!
I love how it let's you use SQL, while taking full advantage of TypeScript's wonderful typing system to give you intellisense and compile-time checking. Reminds me a bit of the SQL type provider for F# (which I was amazed by when I first saw it in action).
I really like the way the readme has been written too - it gives a real insight into the thought processes that led to the final result.
I do like the code-gen solution a lot.
You can create code that is far less bloated than a
generic framework.
I am old fashioned. I like to start with the database schema and generate code from that. I make a change in the schema I regen the code. Thanks for partial classes in C# I can persist customizations between code-gens if necessary.
What just blew me away is the thing with the `JOIN` and the `to_jsonb(authors)`, all with complete typing support for the nested author object. I was actually looking to use a classical, attribute driven query generator (with the sort of chaining API everyone is used to: `tableName.select(...coumns)` etc.) for my next project involving to maybe replace/wrap/rewrite a Rails app and its ORM with Typescript and Node. Maybe I'm trying this instead I'm already half sold. Just worried about forcing colleagues having to learn SQL instead of using a fancy wrapper.
> Just worried about forcing colleagues having to learn SQL instead of using a fancy wrapper.
My current team is pretty junior, and I don't see any problem with this. Simple SQL queries are really easy to learn, and complex queries are harder to understand with ORMs than in raw SQL.
Moreover, knowing SQL is a useful, marketable skill that will stay relevant for many years to come. If there's some resistance, I can easily convince the team that going this route will benefit them personally.
Back to the README, there are two questions I'd like to see addressed:
1. Whether `Selectable[]` can be used to query for a subset of fields and how.
2. In the `to_jsonb(authors)` example, what would you get back in the `author` field if there were multiple authors with the same `author.id` value? An array of `author.Selectable` objects? This part is awesome but brittle, isn't it?
I would love to see this move forward! I will definitely play with it and consider it for my next project.
1. Whether `Selectable[]` can be used to query for a subset of fields and how.
Right — this is not (currently) supported. I guess if you had wide tables of large values, this could be an important optimisation, but it hasn't been a need for me as yet.
2. In the `to_jsonb(authors)` example, what would you get back in the `author` field if there were multiple authors with the same `author.id` value? An array of `author.Selectable` objects? This part is awesome but brittle, isn't it?
Multiple authors with the same id isn't going to happen, since id is intended as a primary key, so I'd argue that the example as given isn't brittle. On the other hand, there's a fair question about what happens for many-to-many joins, and since my use case hasn't yet required this I haven't given it much thought.
OK, I gave the one-to-many queries a bit more thought, and the converse join query (getting each author with all their books, rather than all books each with their author) works nicely with a GROUP BY:
type authorBookSQL = s.authors.SQL | s.books.SQL;
type authorBookSelectable = s.authors.Selectable & { books: s.books.Selectable };
const
query = db.sql<authorBookSQL>`
SELECT ${"authors"}.*, jsonb_agg(${"books"}.*) AS ${"books"}
FROM ${"books"} JOIN ${"authors"}
ON ${"authors"}.${"id"} = ${"books"}.${"authorId"}
GROUP BY ${"authors"}.${"id"}`,
authorBooks: authorBookSelectable[] = await query.run(db.pool);
Right, only querying a few fields seems not to be a builtin feature. Looks like you have to create the partial selectable type yourself and there is no support to typecheck that the correct columns in the select are included.
Your second case, if I recall this correctly (ActiveRecord made my SQL skills fade away), this plain JOIN would just return a row with the same book but a different author. `to_jsonb(authors.*)` is just operating on a single row. But what you want is possible (aggregating rows into a JSON object) by using `jsonb_agg`. Whether the lib supports inferring the correct typings for that is another question though.
> Just worried about forcing colleagues having to learn SQL instead of using a fancy wrapper
I'd argue that learning SQL is essential for any developer.
It's also a "reusable" skill that will stand them in good stead for decades - whereas learning how to use the fancy wrapper is only useful until the next new shiny comes along.
I’d add that it’s essential so you can understand how to optimise and debug a query. You lose a lot of power if you can’t open up a console to describe or explain things.
The long-standing ORMs do a pretty decent job of writing efficient queries these days though. You can go pretty far without knowing much and that’s not a bad thing either.
I'm not even someone that has used multiple orm styles extensively, but it is disturbing/darkly humorous how many orm libs there are.
That lastpost you can map a half dozen Java frameworks to each of the acts.
Personally I never found an orm that tracked which attrs in objects were actually mutated so that only mutated columns would be updated/inserted, but again I never did a lot of orm.
QueryDSL (http://www.querydsl.com/) does something like this for Java. It can generate classes from tables, but even with those, all queries / statements that hit the database are manually built using a query builder to avoid syntax and type errors. I.e. no caching or automatic database updates.
Fully agree! Some mapping code is _required_ in your application, otherwise it wouldn't be able to talk to your database at all.
However, I've never understood why people write this mapping code manually. I believe in code generation tooling as a potential solution for this (where types and maybe a full data access API is auto-generated based on the database schema).
> There is a big difference between just writing helper functions to construct SQL and convert data types, and OO-style magical auto-persisted objects. The latter is what I don't like about ORMs but the former is fine. I feel that this is an important distinction to make.
What's the big difference? Why do you like the former but not the latter? What are the characteristics of the former that makes it distinct from the latter?
Out of curiosity what platform and tech where you using? I am making the assumption of a predominately OO one based on the virtues of ORM. I have always found that when I try to solution back end or middleware based platforms with OO dominate languages (read Java, C#, et. al.) that there quickly becomes an impedance mismatch and any communication with the database becomes a monster of mapping OO philosophy to relational theory, whether that be via home rolled or an ORM.
That being said, I personally have found that I do not like OO languages for back end dev and I find that functional languages such as any variety of LISP marry extremely well to the transnational and process oriented nature of back-end systems as well as lend themselves to not having to jump thru hoops to contort the data into relational sets (Clojure's destructuring is an absolute life saver here). I find that there is little to no duplication of code in regards to transferring data to the db. You may want to give Clojure or F# (depending on your stack) a try for your back end and see if it does not alleviate a host of issues with trying to develop a process and transaction oriented system, which most back ends fit that definition.
I find the converse to be true for the front end. I find most attempts to deal with the UI in anything other than objects and components (read jQuery, React Hooks), turns to spaghetti rather quickly.
If you are using OO languages to communicate and transfer data to the DB you may very well be trying to solution for the impedance mismatch that is easily solved by using a functional language.
I can recommend hugsql to anyone who wants to work with SQL in Clojure. You basically write pure SQL with some minor templating helpers and then you get the data in a map with Clojure data types. Very nice and minimal overhead:
https://www.hugsql.org/
> I find the converse to be true for the front end. I find most attempts to deal with the UI in anything other than objects and components (read jQuery, React Hooks), turns to spaghetti rather quickly.
ClojureScript, Reflex, Grapefruit, Seesaw and a host of others, It's my opinion (so take it with a grain of salt) and it very well may be the way my brain works but I just find functional to not marry well to UI development. For the service and process oriented stuff associated with the front end I think it is great, but when it comes to modeling components, I find objects and inheritance work far better.
This is one of the reasons I have long been a huge proponent of Javascript despite it warts, as it can be OO when I need it to be OO and functional when I need it to be functional.
I've used ClojureScript with re-frame professionally for over 2 years, and it's been the first time in over 20 years of development that I've enjoyed working on the front-end. Unidirectional data flow with a single app atom has been a dream to work with.
It was a Flask app using SqlAlchemy (so Python). I'm not sure functional programming would have changed the situation much. I imagine there would still be repeated patterns involving reading and writing to the database in slightly different ways, and it would still make sense to use some sort of library. But I haven't used functional languages much, so I can't say for sure either way.
Well, the difference is that in a data oriented language, you do not need to map Objects to relations. You can get the data back in the relational format and use it as is in your app. So you don't need an ORM. You might still have a library to help you build dynamic SQL queries, but no object-relational mapping needed.
> Well, the difference is that in a data oriented language, you do not need to map Objects to relations. You can get the data back in the relational format and use it as is in your app.
You can do that in an OO language, too; relations (whether constant or variable, and whether there data is held locally by the program or remotely, as in an RDBMS) are perfectly valid objects.
A query builder carefully preserves the underlying relational and RPC semantics and exposes all of that to the user in an easier-to-use form. That’s just good cautious modest abstraction.
An ORM believes it knows way better than those dumb RDBs how a database ought to behave by ingeniously pretending that everything you’re dealing with is just nice simple familiar arrays of local native class instances. Which, like all lies, ends up spawning more lies as each one starts to crack under inspection, until the whole rotten pile catastrophically collapses under the weight of its own total bullshit.
And of course it goes without saying which of these approaches our arrogant grandstanding consequence-shirking industry most likes to adopt.
Maybe the problem is not that you don't need a mapping layer, but because ORMs are obscure. And maybe they are obscure not because SQL is such a cursed spot, but because object-oriented programming ITSELF drift toward obscurity and magic. Don't you get the same feeling of obscurity about other libraries, e.g. web servers or clients? I often find the bare specs much clearer than (supposedly simplified) OO libraries that implement them.
No, it is not okay. If you need insert/update/select for every object/table that is way too much duplication. It becomes very irritative when the schema changes. There should be table meta data in such a case but one should also know what one is doing. Having no idea that under the water 14 joins are done is not a good situation either.
Well sort of. In my view, duplicate SQL chunks that have defined business logic should either be a new table/view or extremely well-documented with really rigid communication policies for changes.
For example, a company with many data analysts/scientists who may each be writing their own queries. As a basic example, the definition of some “very important” company metric changes, then there would need to be a large number of disperse queries to change.
But an ORM isn’t the answer for the above situation either.
It's relative, if the duplication is that large maybe you do need to abstract.
It also sounds like you would be well served using a service abstraction at that point to remove the data layer from client scope entirely.
The "model changes, now we have to change it every where" isn't going to be solved by abstraction, it's only limited by the amount you're willing to limit access to the underlying model, if you need that information, you need to share the model.
The best solution to this I've seen in practice is domain modelling, colocating shared code near other users. When things get too distant you start using anti corruption layers which allows more flexible model changing.
But at the end of the day this is essential complexity, orm, or any other solution is never going to be able to hide the fact that you need information elsewhere in the system to be useful.
I’ve never seen a homegrown ORM that was better than a third party one. Whenever there is an issue - and there are always issues - you have to dig into the code, because they are never documented well.
There is usually a feature that no one thought about and then you have to make modifications to the custom ORM and you get an even bigger mess.
You just said "you know, if you wanted to have a shitty burger, you can get it right there for half the price of a national chain and it will be at least as good"
One of the most popular Micro ORMs for C# is Dapper which is used by Stack Overflow.
There is no real abstraction. You write standard SQL and it maps your recordset to an object. You know exactly what code is running.
There are extensions that will take a POCO object and create an insert statement and I believe updates, but where ORMs usually get obtuse and do magic are Selects. It’s hard to generate a suboptimal Insert or Update.
Ruby on Rails’ ActiveRecord, for all its heft, is excellent at this. You can use raw SQL any time you like. It was an explicit design goal from day 1.
There are times I dislike things about it, and it can be quite heavy, but it’s very easy to mix and match ActiveRecord ORM code and raw SQL even within a single model class.
That’s how I’ve done it on my last two projects. We used TypeORM for the standard repeated simple queries, and then wrote custom SQL for our complicated queries that the ORM failed at and then just executed them with the ORM. It was really nice and made for easier table refactors because we didn’t have to go through and audit every query that was calling that table.
TypeORM is a step in the right direction for JS ORMs bit it's like 1/8th of the way there IMHO. A nearly fully typed ORM is possible now with Typescript and of course proxy's are out now.. TypeORM was doing too much ADHOC string building under the covers as well. I believe a SQL AST is the way to go. It can be transformed and compiled to database specific SQL allowing for things like predicate push down, optimization, and a sane way to implement db specific optimizations and extensions.
I really like your point about type-safety! I think one major issue with the current ORMs in the Node.js/TypeScript ecosystem (Sequelize, TypeORM, ...) is that they're not fully type-safe. As you mention, TypeORM is definitely a step into the right direction here, but the way how it leverages the TypeScript type system isn't where it could be. I work at Prisma where we're building [Photon.js][1], an auto-generated database client with a fully type-safe data access API.
The informal definition I have of a micro ORM is an ORM without an identity map and without lazy fetching through proxy properties. Are there any more concrete definitions?
So then you’re mapping to whatever data structure your app uses instead of objects. In OOP languages like Python, everything is some type of object anyway.
That’s just a semantic game. If your language returns the result of a query as a generic array of generic dictionaries (or whatever), that isn’t mapping, nor is it object oriented in principle.
But then your generic array of generic dictionaries needs to be mapped to whatever data structures make sense for your application. ORMs save you that step.
I've written many apps and I've never experienced what you've described. I write my queries to return exactly the right data in exactly the right format in exactly the right order—so I can go straight from the generic data structures to the screen interface or document layout.
Nothing says the structure which make sense for an application can't be a generic array of generic dictionaries.
If your favourite programming language forces you to go through the silly hoops of data mapping in order to do useful things with query output, I can understand why an ORM might make sense for you.
This is correct. An in-house solution is a solution developed in-house for your specific problem, which no one else has ever had exactly. The more specific the need, the more the benefit of the made-to-measure solution. The alternatives are something your organization didn't develop, which may be better, but you don't know how to use it, or may be worse, but you don't know that when you pick it, or may be slower, but you don't know that when you start using it, or may have vendor lock in, but you don't know that when they sell it to you as "open", or may have hidden pitfalls, but they aren't in the glossy brochure, or may be unmaintained by anyone except your org in ten years, but you can't know that until ten years from now, or may be full of security holes because it was developed by idiots, but you can't know that because you didn't see who wrote it, or might be full of solid security features and a great design cleverly compromised by a hidden flaw placed in a specification you haven't read by a nation state, but you don't see that because why would you, or... etc etc etc. <sarcasm>But don't worry, at least you didn't have to understand the problem space well enough to be able to sit down and solve it yourself, so you sure saved some effort there!</>
Which bring us to the topic of tradeoffs and the synthesis of balance, by way of weighing competing advantages and costs fairly.
On the one hand, code you must write and understand. On the other, code someone else wrote, that you can just use. There is no clear winner here. It's always a tradeoff.
I rewrote our entire database layer in Hibernate for our (incredibly complex monolith) webapp. Then I was tasked with rewriting a major core piece of search functionality that builds a query from user selections / saved queries.
I was told that contrary to previous work, Hibernate Criteria would not be allowed, since it was deprecated. Hibernate's official replacement for programmatic queries is JPA Criteria, but Hibernate's support for this was not feature equivalent to Hibernate Criteria, so this was out too.
So what I got the green-light to go on was rewriting my own pseudo-ORM wrapper that generates HQL query strings and parameters. Hql is not deprecated, you see.
It's ended up working out moderately well, it's a thin layer and as long as you avoid the rough edges it actually works fairly well, as well as providing a convenient point to translate query language from the old kodo format into hibernate (cringe, code smell).
There have been times I've had to do some very awkward query shit that I've only managed to lever in via HQL. You have no idea, views on top of views.
No idea what'll happen after I leave, that's their problem!
Thanks for the job security, Hibernate team. Your incredibly-poorly-executed transition from a well-supported standard to the "new new" has been exquisitely great for my job security.
Hah... I’m probably falling for Poe’s law here, but anyways... there are certainly cases where in-house is better than external dependency - specifically when your team knows the tech domain better than anyone external can... but in general well-maintained (preferably open source with a community, or a well funded company) external dependencies are almost always better. They usually would have the years of fixing edge cases and features that you would inevitably run into if you were to roll ur own.
They’re also tailoring their solution to be as generic as possible.
Having written OSS and also having written enterprise applications, it seems plainly obvious to me why a homegrown solution is preferred. Code developed internally is understood by the team (you may not understand the underlying implementation of a dependency), and can be tailored exactly to suit your needs (ignoring edge cases that aren’t relevant, removing unneeded features). And you never have to worry about maintainers disappearing, breaking changes being introduced, or bugged releases that you can’t do anything about.
I don’t mean to sound crass but how on earth could you think this is an example of Poe’s law? What’s so extreme about being a responsible developer? I didn’t say “every solution should be developed in house” (though I think most large projects would be better for it!) obviously there’s is a cost associated with in house solutions and you should gauge that cost to see if it’s worth it for your application. But if you’re going to be working with that application for years and years to come then I highly recommend trying to write your own code instead of relying on libraries.
To counter your arguments, I'm going to use a couple of typical examples of when an in-house versus open-source / external debate comes up. I'm not counting the infamous "leftpad" cases, those are usually trivial, and really don't matter in the grand scheme of things. If it's a one-liner, just implement it yourself.
1. A high level database or queue lib, or a custom / powerful serialization lib or, relevant to this topic) an ORM or other foundational/low-level part of your tech stack.
What you can expect to happen is a bunch of very good programmers early on build powerful abstractions using macros, metaprogramming, advanced type system concepts and build up a codebase adding up to a few thousands of lines. It just works, it's a good system - a few bugs are patched by the team every month, but that's fine. Fast forward a few years, the programmers have moved on, "onboarded" the rest of the team to the codebase during their respective last week, but given how complicated the codebase is no one is really capable of debugging it and fixing issues. And given that it's not open-source, it never got an opportunity to build a community of contributors. Your team is now SOL, and it's going to take _months_ to replace it with a more well-maintained open-source solution.
2. Building a A/B testing implementation in-house - again a couple of good programmers build a working, scalable, basic system in a weekend. It actually works and the code is good, simple, readable and well-tested. But then, your PM team or your Marketing team wants you do add graphs. Then export the data to RedShift. And then tweak the algorithms powering the backend. Then multi-arm bandit. And so on. Now, what was now a weekend project, turns into months of work - whereas there exist paid services that do this for you.
Sometimes, it's unavoidable, external alternatives are genuinely not good*. But I strongly think, you have to be very, very careful about building systems in-house when they are not your business.
> I don’t mean to sound crass but how on earth could you think this is an example of Poe’s law.
Sorry for this. I do feel quite strongly against your original comment (at least the way it was written without context), and I think it's the _opposite_ of being a "responsible developer" in all but edge cases, and think you are wrong. But calling it an example of Poe's law was not right on my part, and was harsh.
> But if you’re going to be working with that application for years and years to come then I highly recommend trying to write your own code instead of relying on libraries.
I've done this, and have done both in-house and oss code, but in-house, very reluctantly - for example - when there's just one or two maintainers committing code, and there's no alternative. But even then, I have usually forked the code and used that or parts of that as the base, rather than starting from scratch
There is hardly ever a time where an in house solution is better than a third party one for cross cutting concerns. Most of the packages are open source.
Idunno man, my day job working Rails code uses a custom mailer and job queueing system and everytime I have to work with it I really wish they'd used ActiveMailer and ActiveJob
Like I said, “almost” always. Really the larger the application and the longer time you as a dev will work with it, the more meaningful it becomes to write your own solutions.
It’s really a balance, but I don’t think it’s a balance most devs consider and they really should.
While I agree with you, I'd like to point out there are interesting exceptions: software components that can never be "complete".
Such components require permanent maintenance workforce, and you might not want to dedicate resources for this.
Such as:
- API abstraction layers (like SDL, Allegro, SFML, etc.): you want to support new operating systems / new APIs by default. And most of the time, you don't want to spent time learning about the specifics of X11 window creation or Win32 events, as this would be throw-away knowledge anyway.
- hardware abstraction layers: you want to support new hardware by default, this is why we use operating systems and drivers.
- Format/protocols abstraction layers: if your game engine only uses JPEG files directly coming from your in-house asset pipeline, it's perfectly fine to develop in-house loaders (from scratch or from stb_image). But if your picture processing command-line tool aims to support every file format (especially, the ones that don't exist yet), then you should rather go with an updeatable third-party library, which will allow you to get all new formats by default.
- all kind of optimizers, including compilers, code JIT-ters, audio/video encoders, etc. More generally, all code that uses some heuristic so try to solve a problem that's not completely solved/solvable. You might be ready to accept the performance of a specific version of, for example, libjit. But you might instead consider that in your case, not having state-of-the-art JIT performance might be detrimental to your business, in this case you want to get the performance enhancements by default.
Lack of testing, lack of documentation and lack of use would be reasons that your claim is usually untrue. You can't Stack Overflow a problem and see if anyone else has encountered it before.
This is a terrible reply, what are you even trying to say? You can’t stack overflow a problem so don’t write your own in-house solutions? Lack of testing? We write our own tests. We write our own documentation.
It’s crazy to me how many people on HN are ignorant to the costs of third party dependencies and the benefits of in house solutions when building large applications.
I am trying to say that most of the home grown solutions I have seen have been pretty poor quality, and lack documentation especially. Do enough maintenance programing and you will understand.
If you do test and document your own stuff properly you are in a small minority. Why not release it for others to use?
In house means more customized to the specific problem but with far less expertise in the general technology. I find the latter almost always outweighs the former when working at any cost center tech shop.
Opposite; an in-house solution is almost always worse than an external dependency when that dependency is something as important to get right as an ORM.
The problem in many cases is actually in the OO part, in my experience - in the vast majority of cases where databases and persistence is concerned, staying in the procedural/structured + relational world keeps things simple, whereas objects often obscure what is actually happen, and invoke opaque magic such as ORMs.
I wonder what your experiences had been if after dropping the ORM you had gone one step more and dropped the objects.
After contemplating my distaste for ORMs more carefully, I've come to the realisation that my objections aren't so much to do with the concept of an ORM but rather object orientation itself—and the fetish of treating it as the perfect hammer for every nail.
For the projects I've worked on, I've almost never wanted to turn data into objects. And on the occasions when I've thought otherwise, it usually turns out to be a mistake; de-objectifying can often result in simpler, shorter code with fewer data bugs.
Ultimately, the right answer depends on the nature of your particular business logic, how data flows in your wider ecosystem, and pragmatically, the existing skills of your workforce.
Seriously Jesse, isn't this the very same reason __some__ people end up implementing yet another programming language without realizing it?
First they start out of exasperation with X language they use, because they hit some obstacles or limitations, and before they know it they end up implementing a newly created language.
You know what's the fun part? In their attempt to fix the aforementioned language's issues, they end up introducing __the very same problems__ in their own language, only under different "cloak" so to speak.
SQL is great if you will have multiple applications looking at the same dataset. E.g. An employee management program and a payroll program. In this case you should design a sane schema and mold the app around it.
ORMs are terrible in this sort of world since they tightly couple the application to the data. But if you will only ever have one application anyway the abstraction of a separate schema is pointless.
A lot of people who believe only one app (or one language) accesses their org's datastore are mistaken. You have to take extreme measures to prevent ad hoc uses from popping up.
1. If you are doing anything interesting, people are going to ask questions about what you are doing, and the best way to answer those questions is going to be by querying your database.
2. One day you might want to rewrite some of your service/s, split them into microservice/s, etc. At that point, there will be a minimum of two services talking to your datastore: the legacy service and whatever you're replacing it with. I suspect any alternative to this arrangement will be an even worse idea, e.g. taking a deliberate outage to perform a likely-irreversible migration.
> One day you might want to rewrite some of your service/s, split them into microservice/s, etc. At that point, there will be a minimum of two services talking to your datastore.
You should not do this. It removes almost all of the benefits of extracting things into a separate service (services should own their data and the only means of accessing it should be via their APIs). That's not utopian; that's one of the main reasons you do a service extraction in the first place.
Right, so let’s suppose you already segmented the data to two different backing datastores, and your monolith is now connecting to both of them instead of just the one. Now you can do the service migration, at which point you still run into the situation I’m discussing.
Cutovers are hard, to be sure. Ideally they should also be short (the time time a service undergoing mitosis spends talking to the old and new locations should be measured in days or hours or less).
Don't choose general data access patterns for the infrequent occurrence of cutover. Cutover is when you break a few rules and then immediately stop doing so. Build for everyday access patterns instead (which should be through the API of whatever owns the data--SQL is a powerful language and a really shitty API).
Of course.
But surely you don't let anyone access your API, and you put it behind another API, right? Just in case you need to change that first API without breaking all the users.
Never even tell you have one, else the founder will pat on the back of one of your most junior dev and ask if he can give access to the db to that other team who needs to make money :D
I'm not the GP, but yes, absolutely. There are plenty of things that make this less than awful:
- The existence of tools that allow structured access to multiple APIs (GraphQL is a nice middle ground between "YOLO any queries you want" and "you only get row-by-row access exposed by the web APIs").
- The existence of data on multiple internal data stores. Analytics folks usually are not prepared to engage with the complexity of data being stored across handfuls or more of different stores with different schemas. The owner of the application knows how to join that stuff better than they do.
- Building intermediate/denormalized stores isn't frowned upon just because analytics shouldn't run ad hoc queries on the main production DBs. Expose change streams or bulk ("too much" data) endpoints and make it easy to load their results into a reporting system, which can be raw SQL. It's not redundant; if you don't do this, the following conversation starts to happen often: Q: "I'm running raw analytics queries on production and it's not quite working, can we just make $substantial_schema_change so my report works/is fast?" A: "No, we explicitly chose not to structure the DB/index/whatever like that because it seriously fucks up a real user access pattern."
Forcing analytics to go through the API doesn’t actually reduce load on the production DB, it just increases load on the API itself. Step 1 should probably be a dedicated read replica and step 2 should probably be an ETL process.
Ding ding ding. Dedicated read replica and an ETL gets you to a point where queries don't bring down prod. If you have an analyst org running wild making bad decisions about data that they think says things it doesn't -- that's probably a good sign that it's time for a dedicated data engineering team, and potentially a BI flavored data science team as well.
Analytics queries bringing down prod seems . . . pretty amateur hour. I'm more interested in whether or not analytics queries actually get the data they're interested in when they want it. The reporting team is likely not better versed in what means what than the developers who work on the application databases. What about multiple internal DBs that reporting wants to analyze as if they were one? What about schemas that change over time, obsoleting the analytics team's assumptions? Reliable, versioned data access APIs address both of those families of problems. Yes, it's harder than "YOLO query prod". It also works for longer without breaking, and jives with the scale out plan (usually discrete APIs, sharding, and then maybe microservices and more families of APIs if you're mature enough).
> Reliable, versioned data access APIs address both of those families of problems.
They only address it in so far as they push it downstream to the analyst, who as you mentioned, "is likely not better versed in what means what than the developers who work on the application databases."
There's a reason why datalakes exist, and having used them at past N companies, I think this is why data engineering of the BI flavor becomes necessary at the point that reporting becomes critical. An API is strictly worse than a datalake, and it's not hard to set up and maintain the latter. API versioning and communication are for frontend integrations, paid partner integrations and potentially (although I'd probably lean more on gRPC and the ilk) microservice to microservice interactions. But, I like to avoid building unnecessary API surface when I can.
I like Clojure’s HugSQL[1] for this reason: you can simply write raw SQL, but when you start duplicating code, you can start factoring those bits out into composable “snippets”. The best of both worlds: composability and reuse, while still writing raw SQL.
I have the opposite view. I find ORMs annoying and obscure, and I think they introduce duplicated code.
If you need to run a certain query in multiple places, you need to repeat the same ORM expression or refractor it into a function. I find much better to have a module with all my SQL queries as strings. That way whenever I need to run a query I reference it from there. Of course it helps to use meaningful names.
This approach has a lot of advantages over ORMs:
* you know exactly what gets executed
* automatic DRY code
* the names of the SQL queries in the code are self-explanatory and the reader doesn't have to parse the ORM expression every time
Schema definitions are in standalone SQL files, as well as my migrations.
The only disadvantage is that it may be difficult to switch to a different database system, but that is not a problem for us.
Your argument does not stand up. I can have a file full of ORM sql fragments the same as you file of strings. And I can compose mine together safely and more flexibly than strings.
Have you tried Python's SQLAlchemy, the ORM parent posts are praising? The `sqlalchemy.sql` module is awesome and pretty much maps 1:1 to raw SQL.
Composing SQL expressions using this library instead of using string interpolation/concatenation has several advantages:
* DRY and composition
* safety
* portability (if you have switch the underlying DBMS)
Often the result is as good or better than my raw SQL. The fact that Python has an amazing REPL makes the process pretty much like testing queries in the database prompt but with less cognitive switch between languages.
In the end it is a matter of taste, but I have to agree with parent posts, SQLAlchemy raises the bar for other ORMs.
> I find much better to have a module with all my SQL queries as strings.
But you can't compose them, so there is a lot of duplication. Also, how would you handle dynamic filters and columns? Concatenating strings? That seems error prone. At least a nice query builder would be useful, but then the whole just write sql thing falls apart.
Hahaha, wow! That’s just about the most awful thing I’ve ever heard. If you ever find yourself sitting across a table from an interviewer, I would definitely recommend not including this little tidbit in the conversation.
ORMs seem to be a typical example of over-engineering. Often you don't need all that complexity they come with and when you do, you are probably better of understanding exactly what you are doing.
So maybe building a minimal API, wrapping your SQL queries isn't such a bad idea after all.
The things I've found positive about ORMs are exactly that mapping of results to business objects. The things I've found "not worth it" are the query-building APIs baked into the objects. These principles can be seen in a lightweight ORM I made, PureORM [1].
There is a lot of ancillary complexity in database connection libraries that we could attack before replacing the standard structured query language by some poorly considered mapping of objects to and from relation(s), inspired by poorly understood bad old OOP, which is generally what all ORMs boil down to.
It is striking the balance between your own queries and ORM.
My rule of thumb is that I always go with ORMs for MVPs and small apps. Optimizing for speed usually means going deeper and building a system or queries for yourself. Until that point I usually stick to less verbose code and more to business rules.
Each time I see someone complain about ORMs I remember Greenspun's tenth rule[1], which adapted to ORM would be:
"Any sufficiently complicated program contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of a decent ORM."
ORMs are hard for a reason. Using an ORM doesn't mean you can't or shouldn't use plain SQL where the situation calls for it. You can mix and match perfectly fine.
To me it is more of as Ted Neward describes "ORM is Vietnam of Computer Science"[1]
"Although it may seem trite to say it, Object/Relational Mapping is the Vietnam of Computer Science. It represents a quagmire which starts well, gets more complicated as time passes, and before long entraps its users in a commitment that has no clear demarcation point, no clear win conditions, and no clear exit strategy."
Ya I’ve heard this one a lot. It’s kind of funny to say and does humorously underline the complexity of the problem but people take it seriously. So to take it seriously for a second:
There was no good reason to be in Vietnam; even taking the stated rationale as a given, which many people did not, it was a concern many levels removed from the actual safety or functioning of American society.
ORMs in contrast achieve much more proximate goals — they solve a real problem and can measurably reduce the amount of code you have to write. It is tedious to write the same sort of SQL query over and over. Even if you prefer to use literal sql for the complex stuff (as I do), ORMs tend to be a significant win (in fewer LOC to write) on abstracting out basic queries.
Tbh you could easily claim that the explicit goal, mapping to objects, is incorrect.
The real value is to reduce the damage of the SQL language itself — the unnecessarily ordered clauses, the arbitrary inconsistencies in syntax, the worthless parser errors, the lack of any static typechecking — which cause so much code bloat and debug headaches.
There are two reasons to use the ORM: to not learn SQL, and to generate SQL.
The first reason is the commonly provides one, and what leads us into vietnam. The latter is why people try to avoid ORMs, yet find themselves back in vietnam.
What we really need is a less shitty version of SQL.
I have found the best ORMs don’t hide their SQLness much. SQLAlchemy is pretty great, but you don’t get full use out of it unless you have your arms around SQL itself. When you use an ORM to cut down on chores it’s great. When you use an ORM to avoid your datastore and it’s idiosyncrasies it is worth taking a long hard look at why :)
Most of the ORM interactions are well formatted code that don’t hide the datastore much. As long as you let the ORM map in objects and it’s performant, life is alright. When the ORM starts running the show there may be no coming back.
I enjoyed this comment because I only I only recently tried Sqla for a project and agree fully that it embraces sql, in large part by NOT renaming/rethinking things at the oop level - methods very much tend to be named after sql verbs. I really liked this.
What I liked less was all the setup/config ceremony. Compared to ActiveRecord (the Ruby lib not necessarily the orm concept) I was using more LOC before I got to the part where I started saving time on simple queries. I realize this is because sqla uses the datamapper model. But for me thif sweet spot would be auto setup like AR with sql-like syntax of sqla.
Yeah. When you get good at composing SQLa code it is really nice. You can functionally build your queries and DB interactions. It just fit well with my way of coding. ActiveRecord had a lot of magic to my taste, but you could still drop down with it. Agree SQLa can be tedious at first. No tool is perfect :)
> What we really need is a less shitty version of SQL.
My view is the opposite. The power of SQL perpetuates a low-quality software culture. The root issue is a dev culture that can't see past databases.
A lot of software design runs like this: (1) translate business patterns into a relational schema; (2) build interactions with that schema; and (3) as that gets harder, use SQL arcana and ORMs and views and stored procedures to squeeze out flexibility.
I worked like this for the first decade of my career. My systems got some use, and struggled along, but they are failed projects.
Repeatedly I had this feeling: the project is almost done, but there are some concurrency issues where I would not even know how to start addressing them.
The database-centric design made it impossible to work past that.
After much searching, I came to this: Stored State is a brittle and unwieldy thing, and you want as little of it in your life as possible. The more you have, the harder you have to work to get anything done. Databases are an institution of Stored State.
As an alternative, you can derive state from messages.
I see the same problems but it usually caused by a reluctance to modify the database schema. Once you get comfortable doing that and keep your schema matching the requirements then it solves a lot of the problems.
What I was trying (and evidently failing) to say is that a sufficiently powerful query generation system that's designed for people who actually like their database to be able to generate the exact SQL they would've written by hand is essential to the 'mapper' part of the equation being able to smooth out any impedence mismatches between the appropriate object model and the appropriate database schema.
Hopefully that longer answer is a bit clearer than my first attempt.
SQLAlchemy is pretty close. Been quite a few years since I've had a chance to get drunk with the authors and compare notes though, so I'm not going to try and get into details because I'm pretty much guaranteed to get some of them wrong.
That looks fantastic. No magic mumbo jumbo mapping, just a simple type safe sql. Both syntax safety (no need to remember which of WHERE and HAVING comes first) and type safety on all fields. It's not advertised in the examples on the front page but i also take for granted sql injections are completely impossible since all data goes into functions and are not string formatted, without the mess of having to remember the order of arguments as with prepared statements.
Anyone got tips on similar frameworks for other languages than java and for other dbs.
Ye. Static typechecking is the only thing in his list that I really care about, since you can "git gud" at SQL and not be bothered by the syntax/ordering/parser concerns. jOOQ is exactly what I want to bridge the gap between Java and the DB.
The ordering is always a problem, because your logic may not follow it. Eg if your set of conditions apply to multiple queries, then you might know your where conditions before you know your select/from clauses.
So instead of building up your sql string in a straightforward fashion, you need to have at minimum an abstraction that delays construction.
You get lead into vietnam as almost a direct result of SQL’s context-sensitive clauses.
> What we really need is a less shitty version of SQL.
I think the same. In my spared time I building a relational language (http://tablam.org ..accept more help!) starting even lower: Without a rdbms.
I work in the past with FoxPro, and was possible to build a full app with UI and reports and all stuff you can imagine with a database-oriented language. You code the UI on fox, query with fox, make triggers with fox, etc...
> ORMs in contrast achieve much more proximate goals — they solve a real problem and can measurably reduce the amount of code you have to write.
Why would you optimize for LOC rather than expressing the behavior you want well? In some situations sure—rapid prototyping—but that’s an odd assumption in a general case.
I would actually let OOP off the hook here. I think what did the harm in this case was the java generation. The generation of programmers that were told that in the future they would only have to write the "business logic", and everthing else would just happen. They were taught javabeans, orms, gigantic frameworks. They completely forgot that their code actually needed to execute, and no one cared about their "business logic" if the application didn't do what it was supposed to do.
This generation has only ever used ORM's, and so to them those tools must solve some hard problem, they are so complex after all. SQL must be hard.
It turns out that SQL is actually much simpler than ORM's. The failure modes are much simpler, and the implementations more robust. Sure, writing the code can be tedious, but tedeious is not hard. Writing brainless code every once in a while gives you time to reflect on the design of your system, and think about the larger context.
This is the polar opposite of my experience. Not to mention that you undercut your own argument by admitting SQL’s tediousness: tedious code tends to lead to more tedious code, as subsequent developers fear breaking something, so they just add a layer on top of it, rather than addressing root causes. So that SQL view you had now has 10 inner joins in it, a union, and is being used by five other views now, each with their own similar levels of complexity. Trying to do any refactoring on this is almost humanly impossible, because you can’t keep all of it in your head.
And this is an extremely common situation to find yourself in. Code bases which are of middle- to large-sized, and which are inevitably touched by various hands of various skill levels, tends towards complexity.
SQL is the worst source of unmaintainable and difficult to refactor code. It’s difficult to unit test. Error messages are more often than not inscrutable. (I’m looking at you, Oracle.) There is no idiomatic way to break up complex SQL into functions or classes. There’s no type checking. You don’t have the equivalent of Ruby Gems, Python’s pip, or Swift’s CocoaPods. IDE support is limited to syntax highlighting Compare this with something like Eclipse or IntelliJ where you can just Ctrl-click on something and go to the definition. Want to rename a public method in a statically typed language? Pretty easy. Want to rename a column in SQL? Yeah, good luck. Not impossible, but you don’t have any guarantees that something won’t break until runtime.
So yeah. ORM has its place, and modern ones work very well at abstracting away SQL’s weaknesses.
> It turns out that SQL is actually much simpler than ORM's.
Are they simpler when it comes to matching the data to the application's data structures? One advantage of ORMs is that they encourage this setup from the start.
I think that this is true if you're writing your application in a way that requires object-relational mapping in the first place.
And that's only necessary when you're trying to manage your data in the application in an object-oriented way. And managing your data in an object-oriented way implies more than just the simple fact of defining classes to serve as data records. Those classes can be entirely equivalent to a struct in a procedural language or a record in a functional one. And I can't remember ever suffering from object/relational impedance mismatch when working in a procedural or functional language. Implying that the spot where you really start getting into trouble is when you trot out some distinctive feature of an object-oriented data model.
I submit that the original sin is treating instances of those data classes as if they are discrete entities that can serve as an application-side proxy for some other discrete entity that exists in the database, almost as if ODBC were just a more REST-flavored alternative to CORBA. Which is a thing that I've often been tempted to do in an object-oriented language, but never in a procedural or functional one.
Which isn't to say that I don't use anything to help with talking to databases in those other styles of language. It's just that I retain SQL as my query language (there are plenty of reasons to do it, none of which I'll bother to repeat here) and rely on a more Dapper-style library to handle unpacking the results into data structures. And I don't really consider those to be ORMs; they're just a special class of data mapping utility library.
So, in conclusion, I think that a more accurate stab would be "Any sufficiently hastily built, object-oriented, database-driven non-ORM program contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of a decent ORM."
(1) While treating a single SQL table as a distinct entity is a sin by your argument, I would argue that treating a set of tables as a distinct entity is not. I'm referring to what DDD calls an aggregate, e.g. a customer (address + email history + scoring + whatever tables), or an order (line items, shipping info, payment info, ...) Reading the comments here, I'm starting to wonder if some kind of holy grail or at least useful approach lies in there.
(2) The impedance mismatch has two ends, and one might as well argue that the problem is not mapping tabular data to objects, but mapping objects to tables. This might be an underrated benefit of what is usually called "NoSQL", dwarfed by the whole discussion around "schemaless". Unfortunately, I don't have any experience in that area, but it's something I have long planned on trying. Note that I'm not trying to say that NoSQL "solves the impedance mismatch" but rather that, in DDD's approach to solve persistence, NoSQL might actually be able to do what DDD wants (e.g. with respect to defining a "unit of consistency"), unlike tabular data.
A distinction that may seem like pedantry, but I think is actually the crux of the issue I've had with with ORM:
The problem is not in treating entities in the database as entities. It's in treating instances of classes in the application's memory space as local proxies for those entities.
You're on much firmer ground if you understand them as simple pieces of information derived from the state of those entities as of some point in time. Both in terms of ending up with a more internally consistent and robust approach to data (by virtue of removing some temptation to preserve an illusion that this data can be presumed to always be complete and up-to-date), and in terms of not artificially cutting yourself off from most of the power of the relational model.
You're right that this tension is somewhat resolved by just switching to using an object store of some sort. (NoSQL is far too diverse of a subject to treat as a single unit.) But it's a very particular sort of resolution, because it's a sort of least common denominator approach where you just drag the data store down to the level of the programming model. It strikes me as akin to resolving the difficulties in maintaining complex software in Perl by resolving to never let anyone write anything more powerful than a shell or CGI script rather than by looking for a language that will better support you for the long haul. It can certainly be a fine and reasonable choice. The problem comes in when people don't fully realize that that's the choice they're making.
This seems like one of those topics where people often feel the need to pick a side for some reason. I've often heard criticism to the effect of, "people only use ORMs because they don't know SQL. Learn SQL!" It's seemingly impossible to convince these people that ORMs are fantastic for reducing boilerplate code and they can coexist right next to raw SQL for problems gnarlier than "select all foo where bar equals baz."
Earlier in my career I made a point to deep dive into SQL. Long story short: I realized that there are a lot of very good reasons to limit the amount of raw SQL in your application that have nothing to do with familiarity.
SQL is just a bad language, and it’s unfortunate that we’re still stuck with it, basically unchanged, decades after its introduction.
YMMV, but for me it feels like magic every time I use it. I can get things done in declarative way that would take a while in any programming language I know (and I'm a Clojure guy, so I value simplicity and conciseness).
The only issue I really have with SQL is the lack of control over how a query is executed can really be an impediment. It's like how people were talking in another thread about getting frustrated with garbage collection.
I spent a lot of time working in a department that wrote a lot of ad hoc Oracle SQL, for updates to a production system, and complex reports, and there was one guy that was ten times faster than everyone else, and also more likely to get his code correct, and so I paid attention to what he did.
He would break down a complex set of operations into simple individual queries generating temporary tables; he just wouldn't bother fighting with the optimizer, or trying to predict what it would do. So he not only wrote queries that ran quickly, but he wrote them quickly, and he got them correct quickly.
I would read Tom Kyte, exhorting people to use the full complexity of the language and the Oracle optimizer, but from my experience, it just was not the way to go. I wrote many page (or more) long queries that were things of beauty and then found that breaking them down into simple ones actually was usually much faster.
One fundamental thing that I don't think gurus understand, is that for the average grunt in a typical corporation, there is a separation of duties, such that you can't just go and change the things that a system administrator controls. So saying "your database is configured wrong" doesn't address normal life.
Although the pure relational model may be nice to think about, I find it really convenient to have a certain amount of sequential context, and PL/SQL always seemed to me to have a disgruntled relationship with SQL, so I've gradually tended towards Microsoft alternatives.
The best ORM (outside of ActiveRecord) I've seen was a proprietary hand-rolled one that solved the impedance mismatch.
It had a canonical XML format that entities were defined in and code generation for data access layers, domain models, view models etc.
It actually worked better than I've seen the abuse I've seen developers put EF through. I found it nicer and simpler than times I've worked with Hibernate.
I'm not going to say it was perfect, but every time I think back on it it makes me want to revisit the idea of leveraging a bit of code generation or metaprogramming to be able to have a canonical definition of an entity transformed into concerns that deal with the given entity at different points in the application.
That part of it just really hit the sweet spot for me.
MyBatis has _some_ similarities, notably the defining entities in xml part. But it diverges after that.
Think about it more like MyBatis meets Spring Data JPA. Define that entity in XML the run a code gen which gives you the CrudRepository class but also generates a controller that exposes an API with pretty good good ability to specify adhoc queries. Plus view models.
I think it worked because it both reasonably well designed and hyper opinionated.
What you are saying exactly highlights what the author is missing: That if your application has some logic, you will eventually have to map your database rows to your in memory typed structures.
It sucks. Sometimes it sucks less if you map your queries as well, sometimes it sucks less if you stick to SQL and only map your results, sometimes it sucks so much you're better off with a no-sql solution.
But when using a relational database, ORM isn't optional.
OOP is optional, but if your program has any concept of structured data, then, regardless of whether that structure can be expressed syntactically within the programming language, you will necessarily have a mapping between that and the database. It might be as simple as a 1:1 mapping between relations and ADTs, with collections of references being used within the program to represent the keyed relationships that exist within the database scheme.
Db4j is an async transactional database engine that uses java as the query language. It doesn't eliminate the O/R impedance mismatch entirely (it's not a graph or object database), but it does simplify it greatly since everything is java
I love the idea and it's been fun for me to make demos with, but I've gotten almost zero feedback on the API, and what I have gotten is "can you add a SQL frontend". And yet neither ORM nor SQL are loved
So the question (and I'm not suggesting that Db4j is the answer) is: "what's the API that would be most natural ?"
Unless you don't have objects... then no need for an Object-relational mapping, not a well maintained one, and not an ad-hoc one. All you need is a DB connection pooling library, some helpers or libs to help you build dynamic SQL queries and that's all.
ORMs lure you in with a false sense of neat abstraction. They have nice intuitive examples on their home pages. But then you use them in the real world, doing gnarly queries, and you realize that doing anything powerful and fast in the ORM requires its own completely separate abstractions, which are often difficult for the uninitiated to follow. It's also often a big pain to debug the raw SQL that gets compiled after the ORM does its magic.
The argument I've made before when going down the path of ORMs has been: do we forsee needing to use this model code on a different database engine? Outside of simple toy applications, or needing to support different engines with the same code, I agree that ORMs are more trouble than they're worth.
You don't use ORMs for gnarly queries -- that's not what they are for! They are for making manipulating the entities easier -- reading the data out of the database in a way that makes easy to modify.
You can (and should) use them for simple queries. You have a list of entities you want to query and filter, that's going to be fine. Joins are fine. But if you're doing some complex analysis, an ORM is the wrong tool. That doesn't mean it's a poor abstraction, or difficult to follow, or something to be avoided. It's not the right tool for that job. For the job it's designed for, it's going to save a lot of effort.
SQL is great for analysis -- it's pretty much what it's designed for. But for bringing data into your app and modifying it, SQL is cumbersome and verbose. If you're loading data into objects then you're just creating your own personal ORM anyway.
In ActiveRecord, there's a method called find_by_sql. You can't call it directly; it's a class method on an ActiveRecord model. So you have to choose which of your ActiveRecord models should be used to instantiate the rows of your result set. (What if your result set doesn't really match any of your models? Pick one arbitrarily.) Your SQL has some extra columns. What happens to the data in those columns? They get monkey-patched onto the individual objects. (Which is stupidly expensive in Ruby.) Other than that, the individual objects are fine. They even have all your smart instance methods, which may or may not behave properly with all the ad-hoc monkey patching.
If you tried to short-circuit all of that nonsense, you tend to get arrays of hash tables. Which is, in my opinion, already a perfectly adequate interface!
> What if your result set doesn't really match any of your models? Pick one arbitrarily.
I don't want to sound like the ORM defender, but I'm not sure I understand.
This sounds like a deficiency of Ruby and the ActiveRecord record model. In Java, for example, you'd just write a new POJO for your query, which isn't exactly difficult. There are no "smart methods" or whatever.
It is a valid criticism that this can proliferate data classes, but that depends on the application.
I was writing in terms of what I actually understand and have used—which doesn’t include any Java ORM. In fact if there are Java ORMs that consist solely of POJOs which are populated by raw SQL queries, I would gladly use them!
The API docs don’t make it clear whether that’s still possible so I didn’t mention it explicitly, but I have done that before and that’s what I was alluding to with the “array of hash tables” comment.
In my personal experience, ORM’s seem to encourage queries to get scattered throughout your logic (they’re just normal code and function calls, afterall... at least, they look like it) and encourage mixing application-side logic with query logic. The former makes it incredibly hard to remove if you need to reach for raw SQL and the latter leads to bad performance due to many application-database roundtrips snd not filtering enough before sending data to the application.
Yes, both of these things can be solved through disciplined modularisation of ORM logic, but in my personal experience across multiple companies, most developers simply aren’t that disciplined and treat ORM code as any other application code, instead of treating it as the remotely executed database code that it actually is.
In my experience, writing raw SQL (through https://www.hugsql.org/ in my case), you are instead encouraged to think of them as separate and carefully consider the boundaries, which helps keep the queries and application logic modular and allows for more carefully crafted queries that minimise roundtrips and data shuffling.
Again, this has been my experience, across a number of companies. Perhaps your experience differs, in which case, I’m jealous.
Running raw user SQL isn't a prerequisite of an ORM needed to make it an "ORM", it's a useful feature that most ORMs try to include because the authors recognize the many shortcomings. Also, by writing raw engine-specific SQL, you automatically invalidate one of ORMs biggest selling points which is being SQL-database agnostic.
And by "drop into", this typically means writing custom stitching code that stitches the SQL cursor results back into the models again. It's rarely straightforward.
> you automatically invalidate one of ORMs biggest selling points which is being SQL-database agnostic.
The biggest selling point is a massive reduction in boilerplate code. Database agnostism is a feature that almost nobody ever uses, so who cares! Your proposed alternative is engine-specific SQL so you lose either way. At least with an ORM, you'd lose significantly less. You'd just have to deal with the places you used SQL. Which, in my experience, is pretty small and pretty specific.
> this typically means writing custom stitching code that stitches the SQL cursor results back into the models again.
I often feel like people who complain about ORMs have either actually never used one or used a poor one. As long as my query matches the structure of my object(s) I don't need any stitching code. And if they didn't match, I wouldn't write stitching code because that would be waste of time.
If I'm writing a custom query, I'm probably not looking to integrate into the model anyway -- if it's used for reporting I'd just take the results as is. If I'm writing a query specifically to get a matching model object (to manipulate) I'm going to get the whole model and no stitching would be required.
> you automatically invalidate one of ORMs biggest selling points which is being SQL-database agnostic.
I haven't heard anyone talk seriously about database-agnosticism since the very early 2000s. Maybe some commercial products still try (choose MS or Oracle!), but it's rare nowadays.
The primary selling point of an ORM is that it abstracts marshaling/un-marshaling rows to/from entities. Instantiating and persisting entities to relational storage.
> And by "drop into", this typically means writing custom stitching code that stitches the SQL cursor results back into the models again. It's rarely straightforward.
That's not typical in most uses I've seen. Far more typical are things like:
- Go straight to SQL for reporting, since that's what SQL does. Useful in reporting contexts, and also for list/filter UI screens.
- Use raw SQL to query a list of entity IDs for updating based on some complex criteria. Iterate over the identifiers and perform whatever logic you need to before letting the ORM handle all the persistence concerns.
> I haven't heard anyone talk seriously about database-agnosticism since the very early 2000s.
Do you use the same database engine for your unit and integration testing as you do production? I don't. I use sqlite for unit and local integration testing, and aurora-mysql for production.
As a side note, I quite literally can't use aurora-mysql for local unit and integration testing. It doesn't exist outside AWS.
Unit testing code that touches the database is not useful, and in fact it indicates there is likely a design flaw. Code that acquires from or changes data in the DB should be self contained.
An integration test that doesn't use the same DB as production is unsatisfactory.
Using a DB inside your unit tests is an antipattern and arguably a violation of the concept of a unit test in the first place.
Integration tests should run against a test environment, otherwise, what integration are you testing? I don't see the value in writing integration tests that test the integration between my code and a one-off integration test DB that exists solely for the purpose of integration testing.
There is no syntactical difference between Aurora/MySQL and regular MySQL until you start interacting with S3.
Also if you’re using only the subset of MySQL that is supported by SQLite, you’re missing out on some real optimizations for bulk data loads like “insert into ignore...” and “insert...on duplicate key update...”. Besides that, the behavior of certain queries/data types/constraints are inconsistent between MySQL and every other database in existence.
Finally, you can’t really do performance testing across databases.
> Do you use the same database engine for your unit and integration testing as you do production? I don't. I use sqlite for unit and local integration testing, and aurora-mysql for production.
The PHP ORM Doctrine supports a custom SQL like language called DQL that integrates your defined model/relationships as well as converting it to the correct SQL dialect for your backing store.
Queries look like:
SELECT u FROM ForumUser u WHERE (u.username = :name OR u.username = :name2) AND u.id = :id
It's certainly limiting, though you can write (or find implementations[0] of) custom types[1] that can be pretty powerful (void where prohibited, limitations apply).
I've always thought the "being SQL database agnostic" theory of ORMs was more about a development team being able to choose from some common choices than about apps being portable in practice.
Yeah, I’ve never seen any complex applications, using ORM’s, that were easy to port to another database. Hell, one company I was at switched from MySQL to MariaDB and even that took some work despite that they should be almost the dame thing. If you switch to a more substantially different database, eg, from MySQL to Postgres, then its even harder. I’m also a believer in using a databases features when it makes sense and not limiting myself to the standardised subset of SQL just in case I might want to change databases later.
Hibernate promises that you use the database "as if" it was not there. You can drop into SQL whenever you need, but you cannot remove all the stuff you had to add just to make it work (saving, reloading, handling caching....). So instead of being "out of the way" it poisons your entire code-base.
Now this is excellent point which is true in my experience. It gives little ramp up in starting and keep creating speed breakers as development proceeds to handle real complex business scenarios.
They make simple things simpler, which is a good thing.
I wouldn't mind writing queries in SQL so I don't have to learn a new ORM for every language I use. However I like that ORMs unmarshal results for me in objects / tuples / maps of the language sparing me the work.
For complicated things I write SQL and possibly decode the results manually.
If you know SQL and know the ORM, you can just do the complicated things in SQL.
If you know a problem is going to be complicated, you know to use SQL.
If you don't know if a problem is going to be complicated, you can default to SQL and decide to use the ORM later if it's a good fit once you've got more problem details.
>You can (and should) use them for simple queries.
This is not a very compelling argument to use ORMs. It is saying "it makes easy things easier". This doesn't really buy you much value. The simple things are already simple. Bringing in a very large, complicated external dependency to make simple things simpler, is not a good idea.
>If you're loading data into objects then you're just creating your own personal ORM anyway.
By this definition, the Postgres driver I use for node is an ORM as it takes a row of string/type identification codes and turns them into javascript objects and javascript types. It isn't an ORM though, that is not what an ORM does. An ORM converts one paradigm into a completely different paradigm, which is why it fails and is a terrible idea.
This kind of doctrinaire thinking, this sort of broad and bold declaration, is the stuff of high-traffic blog posts but not good advice for real world developers.
Django, just as an example, does a magnificent job with its built-in ORM. Millions of developers use it, and they are not all fools. A fool is someone who would set out to build a simple-to-intermediate CRUD web app by writing SQL.
> This kind of doctrinaire thinking, this sort of broad and bold declaration, is the stuff of high-traffic blog posts but not good advice for real world developers.
> A fool is someone who would set out to build a simple-to-intermediate CRUD web app by writing SQL.
\u{1f644}
SQL works great for simple-to-intermediate CRUD web apps too.
I love Django’s orm. I wish Django would publish its orm as a separate library. Every time I see a flask / sqlalchemy app I think to myself: Gee you could have saved at least 2x the keystrokes if you’d just used Django.
No it makes simple things easy. Simple things in raw SQL are bloody complicated. Even just getting data, manipulating it, and saving it is at least twice as difficult in maintainability and lines of code than using an ORM.
> An ORM converts one paradigm into a completely different paradigm, which is why it fails and is a terrible idea.
I'm not sure where people get the idea that ORMs fail at their job. They really don't. They do it very well and we're all quite happy.
All the anti-ORM arguments here about how ORMs fail at completely different jobs other than mapping objects to RDBMS operations. Well duh. Nobody complains about cars that can't fly and planes that can't fit on the highway but when ORM can't cook bacon it doesn't fit the paradigm.
> Even just getting data, manipulating it, and saving it is at least twice as difficult in maintainability and lines of code than using an ORM.
That has not been my own experience outside of the most trivial queries. Once any amount of complexity is introduced, I find that ORM-based queries often make it difficult to really see whats going on with indexes and locking, I can’t just paste a query (eg to use EXPLAIN) without finding it in a query log first and, unfortunately, in my experience its rare to find teams disciplined enough to not treat ORM code as if it were normal application code (ie don’t mix it into your application logic), so often end up with a few database/application roundtrips, doing filtering in the wrong place etc. Yes that last one isn’t technically the fault of the ORM, but when I see it again and again in real world code, I start to think that most developers don’t have the discipline to be careful with ORM code while when not using ORM’s and writing raw SQL outside of your applications code, you have no choice. Not the ORM’s fault, but still a symptom of using one that I’ve experienced in multiple teams.
Things don't usually start out bad, but they become bad only after you have a more complex codebase with many users (ie many queries). Its at this point that you need to know what a query is doing, yet its at this point that the query logs are full of queries, so finding the ones I want becomes hard. Putting a trace on the queries also becomes hard when you have a large codebase where the query logic is intermingled with application logic. I mentioned this in my comment.
You say it's my "refusal to want to change the way you work that is the issue", but I never chose to work that way. All of the codebases where I've had this issue were inherited: I was not the one to decide to work like this and I did not mix the query logic into the application logic. But that was my point: I've had the same experience across multiple teams in multiple companies, so blaming the developers seems like a cop out and not much of a solution. I'll happily adapt the way I work, but I can't force existing teams to change. Maybe discipline could fix it, but I have not experienced this discipline anywhere I've worked. But sure, its my fault somehow.
Eventually if you’re working with an object oriented language or if you have to convert the result to JSON, you’re going to have to convert one paradigm to another.
Unless you’re programming in assembly, everything you do is being converted into another paradigm that is what every compiler and interpreter does.
You can only use a fraction of the features of a SQL database when you use an ORM because they don't translate to the new paradigm. When I am using a high level programming language, it is merely helping me do things like manage memory. I am not constantly wishing I could drop down and work with pointers and so on. It is a foundational paradigm that builds on the top of the one before it. ORMs just present a different, incompatible paradigm.
There are plenty of scenarios in certain verticals where you need to control memory management like games and others where you need to program in assembly. Garbage collection is an incompatible paradigm where you need to control when memory is allocated and freed.
Back in the day when I was doing C, there were times when we just couldn’t get the speed we needed from the compiler. I wrote inline assembly. Does that mean C was unsuitable because it was incompatible with our performance requirements?
An ORM is not a high-level version of SQL though. A more accurate metaphor for an ORM would be like a converter from one high level programming language that is object oriented, to another high level programming language that is functional.
In the case of C#, LINQ is a declarative built in part of the language. At runtime, it is converted into an AST and if you use Entity Framework, it is translated into SQL - another declarative language.
You can theoretically express any standard SQL query in LINQ even though outer joins can be obtuse at first. The translation may not be as optimal as hand written sql, but no compiler can translate code into assembly that would be as optimized as someone who could hand roll their own. We decided decades ago that high level languages were worth the trade off most of the time.
This is where I disagree. I dislike having a value which is the entity. The basic lesson from relational databases and later data oriented design is that you don't have an entity. All you have are aspects that are related.
This is also tapping into one of the many optimizations you have to know about when using ORMs. Can you "select" 10M rows from a table? Is an object instantiated for each one? If they're lazily created, when are they destroyed, and where is the buffer of rows held, client side or server side? How do you efficiently update each one without incurring a sql statement for each update?
All of these questions require deep knowledge of the inner workings of the ORM, when in SQL, this is a lot more straightforward.
Not really. It isn't a SQL issue; it's a database driver problem.
The Django ORM takes take a whole of an hour to read, and after that you have 90% of the use cases covered. Considering that I've been using that framework for many years now, the overhead of that knowledge is irrelevant.
You're going to love this: about the time I left one place I worked one top engineer did a meeting on using Hibernate for... drum roll... analytic queries. My brain fried with questions of why, how is this better, and this is definitely the result of someone that hasn't touched real SQL in a while and just wanted to use the same hammer for everything.
I tend to use Active record for simple queries. And for anything complex, I write a SQL query to define a view and build a view backed model. That way I can use SQL when I want/need to, but I can keep it contained in one place, and I don't have constantly rewrite tedious queries.
That's not true. A simple job might be 3 lines of code in an ORM to update a record. In SQL that will be a lot more code especially if that's wiring up a foreign relationships. With SQL you will also have a lot of uncheckable strings containing code.
Simple tasks are done maybe thousands of times in any one application. It's the complex tasks are rare.
>A simple job might be 3 lines of code in an ORM to update a record. In SQL that will be a lot more code especially if that's wiring up a foreign relationships.
That's not a fair comparison unless you include the time and effort involved in creating your ORM models, before you can even write those 3 lines of ORM code. The effort to construct those models isn't even fully amortized over your project, since it must be maintained through schema migrations.
ORMs are like putting on an exoskeleton to go buy groceries because it will let you carry your bags more easily. In my opinion, the complexity and indirection they introduce for accomplishing simple tasks do not justify their existence in the vast majority of cases.
You would need to show me your DDL statements because that's the equivalent. I can generate the model from the database or the database from the model.
It's not much more effort to type "create table employees" with all the fixings as it is to type "class employees" with all the fixings.
>I can generate the model from the database or the database from the model.
Depends on the ORM. Just like "raw sql mode", not every ORM supports that. It's not an inherent feature of being an object-relational mapper, it's part of the bells and whistles of some ORM packages. And I'm going to guess you're coming from the Python/scripting universe, because generating models in other languages is definitely more complicated.
But let's pretend that all ORM software can generate model definitions from the database. Just do the update then. I want to validate your argument that a simple update in SQL is "a lot more code especially if wiring up foreign relationships." If you think an ORM is much less code, or simpler, I'd like to see how.
I think I can give a better example that isn't purely about performing updates, but is still about "simple things" and not particularly complex flows.
In my API, I have models from my ORM. These can then be used for database migrations, which allows changes in my database structure to be checked in to version control.
From there, these models are obviously used for queries within my app. Generally, you are correct that generating a simple query with the ORM is not meaningfully easier than writing the SQL myself, with the one exception being that the ORM sanitizes inputs to my queries without me having to think about it at all, which is nice.
From there though, I can use the models from my ORM to automatically validate incoming requests to the API endpoints, as well as automatically serialize the results of my queries when I want to send a response back to the client. If you're not building a REST API, obviously this might not be particularly useful, but for someone that is, it has saved me a lot of work.
Finally, generally I find code in my code is a bit easier to debug than strings.
Your example makes for good SQL but a simple ORM example:
employee = new Employee() { Name = "Bob" }
employee.Manager = new Employee() { Name = "Jill" }
context.Employees.Add(employee)
context.SaveChanges()
This would insert a new employee record for Jill, grab the primary key, then insert the record for Bob with his ManagerId field to Jill's id. If Bob wasn't a new employee, it would perform and update instead. If there were more related elements and different levels of depth the ORM would order the inserts/updates to get the necessary keys and wire everything up. After this block of code, the objects are all updated with their new primary key values. Everything is executed in a single transaction.
The ORM is performing operations on objects -- it's pretty much right there in the name -- it's going to be a poor choice for bulk updates because that's not what it's for. But again, nothing stops you from using the right tool for the right job -- whether that be an ORM or raw SQL.
Yeah, your "uncheckable strings" (which could be checked by spinning up a testing database) is replaced by 700k lines of code that, although tested, is still full of bugs.
The Hibernate repo is 700 thousand lines of java code!
There's a tendency amongst some (especially "enterprise" programmers) to forget that dependencies are also just code. If they break, you are on the hook too. You own that complexity.
Just because a job is simple doesn't mean it doesn't take time & effort. A tool can be helpful if it reduces the time or effort to accomplish something.
I don't think that is what parent wanted to say. It's still a lot of boilerplate code and config that can be avoided by using an ORM for the 'simple' parts of an application. And if you use a framework like sqlalchemy, you still have access to the lower levels if it turns out the ORM abstraction is unsuitable.
Whenever I get to the point where I'd do something gnarly with a query, I just fall back to SQL statements with results that the ORM can parse; or just dump directly into data structures and go from there, even if the originating tables still have ORMs.
If your framework/library/language doesn't let you skip the ORM - or populate the object with the results of a custom query - well, I'd say that's a failing of that ORM, not of ORMs in general.
I remember the very first time I had to do anything interactive on a web page. We had a long list of items in a table, and as a stopgap for adding search functionality we were going to sort the table by multiple columns.
Stack Overflow was still a twinkle it Atwood's eye. So I go googling about for stable sort implementations in Javascript and I find plenty. Except they all have the same problem. They were all slow as molasses. I should have taken it as a huge warning that none of these implementations used demos showed sorting of more than 10 items.
I needed to sort 100 items. 300 at the outside. Ultimately I had to go to source material and implement it myself.
Clearly there is some notion in the software community that market for ideas is inexhaustible. That the 'oxygen' in the room is infinite, and therefore I can interject whatever half-assed concept I had into it without costing anybody anything.
Virtually all of us, when we tackle a problem, try to do better than what is already on offer. But what if the thing on offer is truly, horrible? If my only goal is 'better' instead of 'good', then my alternative will be really bad. And in a field full of bad, who wants to be the person who introduces the 6th standard? The 8th?
Keep your dumb ideas to yourself, or put them in question form and ask people why
After someone posted Norvig's Sudoku solver, I was troubled by his statement about how many algorithms he'd have to implement so he just did brute force. So I took a whack at it, got fairly far down the deductive path before things got hard. But I'm not going to show it to everybody. The internet has been working on strategies for 10 years, and they've done way more algorithms than the ones I knew about. The best I've managed is to maybe simplify a couple rules, but I think one could argue that it's the bad definition of 'simple'. It can't handle as many cases, but I can explain it to anyone. Is that enough to add to the noise? Probably not.
As someone who’s been using mostly Clojure recently, wouldn’t this apply to programming language based object systems, too? Maybe the problem with the object relational impedance mismatch is the objects.
There's absolutely nothing in this world that has frustrated me more than knowing how to write a query in SQL but, for the life of me, not being able to express it using the ORM syntax.
Since then I keep using ORMs (because mapping the things you pull from the db to actual objects is undoubtedly good) but I write my queries by hand.
I also dislike ORMs that, by default, when you try to access an object you forgot to pull from the db with the query you wrote, automatically generate another query to pull the data instead of erroring out.
For example, Ecto isn't technically an ORM but it lets you do ORM-like things. It's Elixr's data mapping and query language tool. It happens to be one of the nicest "I need to work with data" abstractions I've ever used.
It's a bit more typing than ActiveRecord and even SQLAlchemy, but you feel like you're at a good level of abstraction. It's high enough that you're quite productive but it's low enough that it doesn't feel like a black box.
You get nice benefits of higher level ORMs too such as being able to compose queries, so you can design some pretty compact and readable looking functions, such as:
def eligible_discounts(package_id, code) do
__MODULE__
|> for_package(package_id)
|> with_discount()
|> active()
|> usage_count_less_than_usage_limit()
|> after_starts_at()
|> before_ends_at()
|> maybe_code(code)
end
Each of those function calls is just a tiny bite sized query and in the end it all gets composed into 1 DB query that gets executed.
I think Ecto's biggest win was having the idea of changesets, schemas and repos as separate things. It really gives you the best of everything. A way to ensure your data is validated but also flexible enough where you can separate your UI / forms from your underlying database schema. You can even choose not to use a database backend but still leverage other pieces of Ecto like its changesets and schemas, allowing you to do validate and make UIs from any structured data and then plug in / out your data backend (in memory structs or a real DB, etc.).
Initially ORMs can save time when developing as you get an easy mapping between objects and the database.
However in practise ORM tends to give you quite horrible JOINs that quite frankly are hard to understand for humans.
Further more I think that ORM can lead to a bad practice in the sense that you do not need to think about your data layout first.
But for database performance data layout is of utter most importance. One need to have data in lay out in a form that makes the application run
fast. ORMs does not necessarily provide that. One need to normalize the database.
ORM save you time during initial development but you pay later in the maintenance phase when what is complex queries that humans may not understand are hard to optimize for performance.
This is completely false. The joins I do in the ORM are completely transparent to me as a developer and there is no mismatch whatsoever. The data layout and data migrations are well described.
And I have, at most, a half dozen difficult queries which require serious optimization, and that optimization isn't defined by the query but by the indexing and storage strategy for the tables in question.
> The joins I do in the ORM are completely transparent to me as a developer and there is no mismatch whatsoever.
Care to provide any examples with comparisons to ANSI SQL or any major SQL platform?
*take note that “SQL became a standard of the American National Standards Institute (ANSI) in 1986, and of the International Organization for Standardization (ISO) in 1987.” the same cannot be said for any ORM.
How many of these are a result of of features that have nothing to do with getting data in and out of the database.
Get bare metal. Anything caked on adds friction.
That doesn't mean "write raw SQL". That means I want just enough of an ORM to give me type-safety for my queries and updates. Leave the rest at the door.
This topic pops up frequently here on HN and every time I’m shocked at how many people have issues with ORMs! I’ve been using Hibernate/Spring Data for several years now and never ran into any issues. If I need to write a complex query, I can easily write a @Query annotation in HQL and it neatly fits right in to the repository class. I also develop with query logging enabled so I have better understanding of the queries actually ran on the DB. I think it really boils down to using the right tools, the right way.
This topic pops up frequently here on HN and every time I’m shocked at how many people have issues with ORMs! I’ve been using Hibernate/Spring Data for several years now and never ran into any issues.
Same. Hibernate with Spring Data JPA, for me, "just works". But it's not necessarily the right solution for every database interaction. But for a typical microservice which might have up to 20-25 discrete entities, and few if any crazy complex relationships, I find that I plug it in, extend CrudRepository, write some JPQL queries in @Query annotations here and there, and bob's yer uncle.
If you were writing a monolithic CRM system with 900 domain entities, super complex rules governing the inter-relations between them, and trying to write your analytics queries right into the app, then an ORM approach would probably fail miserably.
I'm trying to pick up FeathersJS right now and I will tell you it is hard to find ANY examples of complex queries translated to the ORM's logic. I can find bits and pieces but without any sense of the underlying purpose.
There are 500 tutorials of installing Feathers and making a CRUD app, but something as simple as combining results from two requests or doing a raw DB request seems to exist only deep in the docs (and more often than not, requires bonus functionality from an add-on).
I am familiar with managing databases and I can see benefits of ORMs, but I feel like "glue" technologies could benefit from deeper real-world examples that move beyond day 1 tutorials and get into translating existing code to their style. Half the time I am left digging through outdated github code to grasp the basics of how and where mid-level functionality can be used.
Same here. Hibernate team had always, even as early as 2003 when I first started using it, advocated for turning on show_sql=true in development. If the code interacting with the DB fetches a table - rows and columns - of data, say for displaying a report, ORM does not buy you much. Instead, if the code converts the ResultSet to entities in an OO language you always end up using an ORM - a hand-rolled one, or an off-the-shelf one. I just don't understand the hate.
I have to congratulate you for having worked with a quite sophisticated and complex piece of technology for several years and never run into any issues.
And if you either lack the skills to properly use JPA/Hibernate or really have super custom special requirements, you can at any time execute native queries with the EntityManager. Still not enough? You can even directly work with Connection objects.
> I’m shocked at how many people have issues with ORMs
Simple inexperience. I'd bet most of those people are mid-level developers who have used ORMs enough to hit the rough edges but not enough, or with enough independent agency, to have worked through how to play to ORM's strengths while avoiding their weaknesses. People that were given a hammer and are just understanding that their hammer doesn't work very well to install bolts, and maybe don't have the authority to say maybe I should use a wrench or the flexibility to try this new crescent shaped idea and see if that works better.
Or maybe you worked only on “toy” projects and never understood that you are doing more work that could be expressed more elegantly in a functional SQL expression instead of using the imperative statements of the ORM engine?
The craziest thing in this discussion is that I have to defend SQL that is probably my least favourite language... I never expected this honestly.
My "toy" project with several hundred tables and tens of thousands of users does fine.
I have, at most, a couple dozen "complex" queries in this project.
Whereas I have an order of magnitude more queries that need to be composed from several different query criteria, a task for which SQL is very poorly optimized for and most ORMs excel at.
I have used my ORM for so long that writing a report in SQL or the ORM language is basically the same to me. Neither technology is something I would consider to be "hard", as most of the problems encountered in practice are well-trodden.
Nontheless, a simple ORM query is 20% the length of an equivalent SQL query. And I can compose them trivially. And then I use the model code for the _hard_ part: dealing with the rest of the business logic for template rendering, email sending, API interactions, and so on, for which SQL is completely useless.
Reality does seem to paint a very specific picture.. Django; ORM. Rails; ORM.. I think just about every super popular framework either created or has official support for an ORM. Most popular language platforms have a descent enough ORM options. Phoenix? Let's just call Ecto an ORM.
Thinking about projects I'm aware of.. Stack Overflow created an ORM regardless of if they call it "Micro". They created it and people use it. GitLab uses ActiveRecord.. Gogs and Gitea use ORMs that are a bit fringe TBH.. I actually can't think of a large project I've touched that went completely raw dog on the SQL.
I do believe there are instances where using an ORM, or at least all of an ORM, is not the best choice. But to come out and say they should always be avoided? It's interesting how many people are coming in here SO SURE about that position that is seems completely at odds with what the rest of the industry is, objectively, doing. And when pressed their insights into that opinion are so shallow a baby couldn't drown in them. Hmmm.
It's condescending and convenient for you to discount the people who disagree with you as merely inexperienced.
You could level the same empty claim at people who like ORMs: I loved ORMs when I was a beginner but learned it's best to avoid them as I accumulated experience. So anyone who likes ORMs is merely in that beginner stage.
And regardless, why should we develop things that only "experienced devs" can work with without blowing a hole in their face? Junior and intermediate devs are often in larger projects.
Even if you use an ORM you should probably know SQL so you can use them efficiently. Also, if you use an ORM, make sure it "starts with database schema" not with objects. Otherwise the mismatch will probably be horrible for you. A dozen years ago I published this article on generating your domain objects from the database, still a reasonable thing to do if you want the added query expressiveness the ORMs have over most SQL generators.
A feature of Netbeans I always liked was the ability to create a Hibernate Java Object directly from a database table. It does make sense to me to start at the DB first because the application is just a broker between the front end and the DB so it's more flexible to make the adaptations there.
This article was well balanced. I find title just a bit misleading, since "What ORMs have taught me: just learn SQL", to me, carries a slightly different message than "What ORMs have taught me: learn SQL".
The second sentence is a little more compatible with the first line, "they can be used to nicely augment working with SQL in a program, but they should not replace it."
That's certainly how I use them. ORMs can save you a lot of irritating typing where it comes to insert and update statements. Aside from that, I write a lot of raw SQL.
I have seen what I would describe as ORM-induced, yaml-induced database damage. This isn't meant as a criticism of these tools per se, they're perfectly compatible with a well designed app. But I have noticed that people sometimes create databases that are useful only in the context of their configuration-file/ORM heavy app. Essentially, the programmers conceive of their data as a set of objects, and they use the config file to store global constants and the ORM to persist files, almost as if they're pickling and retrieving objects back into the system.
The result is a database that can't really be queried with SQL, more or less useless outside the context of the application. I firmly agree with developers who maintain that information will outlive an application, a database will outlive the software that was originally designed to use it (perhaps in parallel with it). I think a SQL database should be useful all on its own as a data source. If you got rid of the app, you'd lose a lot of operations on that data, a lot of UI, a lot of valuable things, but you'd be able to get at and use your data. If that's not the case, I'd seriously reconsider the design.
Kind of hard to do that without understanding SQL, so yeah, definitely learn it.
I too have gone back to SQL after working with ORMs for 10+ years. Having worked with them on a wide range of projects and teams, I can say, without any reservation, they are not worth it.
Not worth it for the small projects nor the large projects. They significantly complicate the development workflow and add another layer of (often times, cumbersome) abstraction between the user and the data.
If you encounter any issues (and better pray you don't), expect to spend hours stepping through reams of highly abstracted byzantine code, all the time feeling guilty when you just want to open a database connection and send the SQL string you developed and tested in minutes.
I've ripped out broken ORM on multiple projects with over-engineered domain models designed by people with no apparent knowledge of how to do a proper database design. This is the key problem with ORM. It leads to lots of unnecessary joins just so you can pretend databases do inheritance or all those tiny objects you will never query on need dedicated tables with indexed columns. It's stupid. It's also stupidly slow, fragile, and hard to maintain such systems.
I've been on a project where we had 20+ tables. After I made the point that the sole purpose of this database was producing json documents through expensive joins that were indexed and searched in Elasticsearch (i.e. this was a simple document db), we simplified it to a handful of tables with basically an id and a json blob; got rid of most of the joins and vastly simplified the process of updating all this with simple transactions and indexing this to elasticsearch with a minimum of joins and selects.
We also ripped out an extremely hard to maintain admin tool that was so tightly coupled to the database that any change to the domain made it more complicated and hard to use because the full madness of the database complexity basically leaked through in the UI.
ORMs don't have to be a problem but they nudge people into doing very sub optimal things. When the domain is simple, the interaction with the database should be simple as well. We're talking a handful of selects and joins and simple insert/update/delete statements for CRUD operations. Writing this manually is tedious but something you do only once on a project. With modern frameworks, you don't end up with more lines of code than you'd generate with an orm. All those silly annotations you litter all over the place to say "this field is also a column" or "this class is really a table" get condensed in nice SQL one liners that are easy to write, test, and maintain.
I whole-heartedly agree. The thing about ORM is exactly that it is so easy to get a false sense of security of "everything working" while you're clearly doing sub-optimal things all the time. It's just a fact in order to do proper, extensive SQL operations you have to know SQL and to learn it you should write a lot of SQL. ORM gets you up and up running quickly, but in the end I think it's better learning it the hard way first and _then_ being smart enough, hopefully, to know when it's better to use ORM than doing everything by hand.
I understand both points of view, with ORM saving my brain from writing massive JOINs on multiple tables yet at times forcing me to dissect a ORM query because it's doing something stupid. But if I didn't use ORM I would have probably not even made those stupidly complicated tables that I have to debug in the first place. Maybe even worse, is that instead of learning SQL I had spent all my time learning the ORM's API.
It's a balancing act, with good points on each side. I like writing my own SQL, I think it makes me think harder what I'm doing. Sure then I'll be probably writing my own helpers that might resemble a half-assed ORM but as long as it is kept simple, outside of the hands of those who wish to over-abstract everything with their fancy design patterns, it should be highly efficient and easy to understand. And the best of all, my understanding of SQL will be a lot more useful than knowing some language-specific ORM.
> ORMs don't have to be a problem but they nudge people into doing very sub optimal things.
I don't think good ORMs do any nudging. The issue arises when people assume that because they are using an ORM they don't have to learn the underlying DB. ORMs should be treated as tools that sit on top of your SQL knowledge and allow you to do certain types of things easier.
Like any tool, there are inappropriate uses cases. On one side you have people using an ORM to implement a document store, on the other side you have people that end up hand-rolling a crappy ORM because they thought they didn't need one.
I don't do object relational mappings generally. I instead query by id or a few other columns and construct objects from json documents stored in text columns.
Frameworks for that are awesome and a lot easier to deal with and serializing/deserializing overhead is typically minimal. Columns in databases only have two purposes: indexed columns for querying (ids, dates, names, categories, etc.) with or without some constraints, and raw data (json or for simple structures some primitive values. Some databases even allow you to query the json directly but in my experience this is kind of fiddly to set up and not really worth the trouble. The nice thing is that most domain model changes don't require database schema changes this way because the only thing affected is your json schema. This makes iterating on your domain model a lot easier. You still have to worry about migrations of course.
The added value of using a database is being able to manipulate them safely with transactions and query them efficiently. Bad ORM ends up conflicting with both goals and the added value of well implemented ORM is usually fairly limited. At best you end up with a lot of tables and columns you did not really need mapped to your objects and classes.
A good table structure often makes for a poor domain model and vice versa. The friction you get from the object relational impedance mismatch is best avoided by treating them as two things instead of one. Bad ORM shoves this under the carpet and in my experience does not address this (other than by providing the illusion this is not a problem).
> I instead query by id or a few other columns and construct objects from json documents stored in text columns.
> Columns in databases only have two purposes: indexed columns for querying (ids, dates, names, categories, etc.) with or without some constraints, and raw data (json or for simple structures some primitive values.
You are basically describing a sort of ad-hoc document store with potentially limited ability to query. You've lost many of the benefits provided by a relational DB. Your ability to do run large update queries or reports will be limited (unless your DB provides native json support, which as you said, is fiddly).
If you are going to do this, why not use a NoSQL document store with support for transactions? Then you will get a tool that is designed to work with your use case.
Edit: If you use an ORM, a hybrid approach is possible. Where you store some properties as separate columns and then store the less frequently accessed (or more dynamically structure) data in a json field (which you can deserialize on hydration or on request). The main downside of this hybrid approach is that moving a propery out of the json field into a normal column would require using that fiddly native json support or a fairly slow migration that would need to go through and serialize each json field.
> A good table structure often makes for a poor domain model and vice versa.
Can you clarify what you mean? This has not at all been my experience so I am curious and would love to see some examples.
Transactional semantics are problematic with a lot of nosql databases but I've used a few and you can work around this if you have some kind of consistency checks using content hashes. Postgres is pretty nice these days for a wide variety of use cases; including nosql ones. And it does transactions pretty nicely.
In short, there are lots of things you'd do different in an OO domain model vs. properly normalized tables. A lot of what ORMs do is about taking object relations and mapping those to some kind of table structure. You either end up making compromises on your OO design to reduce the number of tables or on the database side to end up with way too many tables and joins (basically most uses of ORM I've encountered in the wild).
For reporting, you can of course choose to go for a hybrid document/column based approach. I've done that. In a pinch you can even extract some data from the json in an sql query using whatever built in functions the database provides. Kind of tedious and ugly but I've done it.
Or you can use something that actually was built to do reporting properly. I do a lot of stuff in Elasticsearch with aggregations and it kind of blows most sql databases out of the water for this kind of stuff if you know what you are doing. In a pinch, I can do some sql queries and I've also used things like amazon athena (against json or csv in s3 buckets) as well. Awesome stuff but limited. Either way, if that's a requirement, I'd optimize the database schema for it.
But for the kind of stuff people end up doing where they have an employee and customer class that are both persons that have addresses and a lot of stuff that is basically only ever going to be fetched by person id and never queried on, I'll take a document approach every time vs. doing joins between a dozen tables. I also like to denormalize things into documents. Having a category table and then linking categories by id is a common pattern in relational databases. Or you can just decide that the category id is a string that contains some kind of urn or string representation of the category and put those directly in in a column or in the json. You lose the referential integrity check on the foreign key of course; but then you should not rely on your database to do input validation so that check would be kind of redundant.
> Postgres is pretty nice these days for a wide variety of use cases; including nosql ones.
um... what? Are you meaning to say that Postgres does a pretty good job as a document store? (not synonymous with "nosql")
Despite that that wikipedia article says, most (if not all) of the "impedence mismatches" described apply to most document stores as well. I would be curious to hear which of the mismatches described in that article you think are avoided by using Postgres as a document store. In my mind, the reason for using a document store is to have flexibility in the structure of your data (which can be a positive or negative depending on your needs).
> Or you can use something that actually was built to do reporting properly. I do a lot of stuff in Elasticsearch...
Of course there are document stores with good reporting. I was talking specifically about the downside of using Postgres as a document store given your complaints about its native json support being fiddly.
> But for the kind of stuff people end up doing where they have an employee and customer class that are both persons that have addresses and a lot of stuff that is basically only ever going to be fetched by person id and never queried on, I'll take a document approach every time vs. doing joins between a dozen tables. I also like to denormalize things into documents.
I often de-normalize addresses in my tables, but that choice is based on how you will want to store and update that data. A separate address table is good if you want to be able to automatically propagate address edits between records. A de-normalized address is good if you want keep records of that address for the purpose for which it was used. De-normalization is always an option with a relational DB, but normalization is not always easy some document stores.
> Having a category table and then linking categories by id is a common pattern in relational databases. Or you can just decide that the category id is a string that contains some kind of urn or string representation of the category and put those directly in in a column or in the json. You lose the referential integrity check on the foreign key of course; but then you should not rely on your database to do input validation so that check would be kind of redundant.
I'm not quite sure what you are on about here. You can use constraints on columns that are strings and you can have tables that are composed entirely of an indexed string column to point that constraint towards. Integer Ids are primarily used just to save space. (I don't really see how this is relevant.)
I don't see anything here to justify your assertion:
> A good table structure often makes for a poor domain model and vice versa. The friction you get from the object relational impedance mismatch is best avoided by treating them as two things instead of one.
To be frank, it sounds to me like you ran across a bunch of poorly designed DB schemas (or schemas you didn't understand the design decisions for) and decided that it must be impossible to design good DB schemas and so you just use unstructured document stores instead.
Even more advanced programmer: writing the ‘boilerplate’ queries out manually takes barely more time than composing them in an ORM, means less indirection, saves me a major dependency, and encourages me to think intelligently about each query no matter how boilerplate they might seem at the surface.
Super-advanced programmer: allowing my database structure to be influenced by the needs of an off-the-shelf ORM will make it worse.
What ORM do people use that influences the structure of their database? The ORM I'm currently using the most can do whatever I need with my Postgres DB.
Also, the ORM allows me to specify models that are not only used for structuring the database, but also for validation of incoming JSON requests and easily serialize queries back to JSON.
ORMs that I've experimented with tend to fall into one of two categories: either they treat the object model as prime, or they treat the relational model as prime.
The former almost invariably spurt out inefficient queries, or too many queries, or both. They usually require you to let the ORM generate tables. If you just want to have your object oriented design persist in a database, that's great.
The latter almost invariably results in trying to reinvent the SQL syntax in a quasi-language-native, quasi-database-agnostic way. They almost never manage to replicate more than a quarter of the power of real SQL, and in order to do anything non-trivial (or have things done in a way that lets your database server scale) they force you to become an expert SQL anyway, PLUS an expert in how your ORM translates its own syntax into SQL.
And once you become more expert at SQL than your ORM, it's not long before you find the ORM is a net loss to productivity—in particular by how it encourages you to write too much data manipulation logic in code rather than directly in the database.
I think you may have only experienced bad ORMs then?
All an ORM needs is a mapping between database fields and object properties so a good ORM should allow you to separately define a mapping between your object model and relational model so you retain full control of both.
> it encourages you to write too much data manipulation logic in code rather than directly in the database
I find doing too much business logic related data manipulation directly via SQL to be an anti-pattern that creates significant problems with testing and separation of concerns.
ORMs are good at hydrating objects and persisting updates to those objects. Hand writing code to do this is a waste of time.
SQL is good at running reports and performing mass updates an ORM that doesn't allow you to easily do this is bad.
Whereas I find doing too much business logic related data manipulation not performed by the database to be an anti-pattern that creates significant risks with testing and a source of data bugs.
My model of thinking is that any copy of data that isn't currently resting in the database is potentially stale; avoid round trips like the plague; get new data into the database as soon as possible.
For me and the way I work, it's less about good vs bad ORMs, rather more often a question of whether I even want my data hydrated into a special object at all. I've come to the realisation that for the kind of work I do, data objects almost always end up being an unnecessary layer of indirection that don't give me any real benefits—and they change the way you think, because every transform becomes an opportunity to write a method on an object and not a straightforward query.
> My model of thinking is that any copy of data that isn't currently resting in the database is potentially stale; avoid round trips like the plague; get new data into the database as soon as possible.
Nothing about an ORM stops you from persisting data as soon as it is ready or updating the state from the DB to ensure consistency (or from using transactions).
> rather more often a question of whether I even want my data hydrated into a special object at all. I've come to the realisation that for the kind of work I do, data objects almost always end up being an unnecessary layer of indirection that don't give me any real benefits
Yeah, if you don't need to use objects than there is no reason to use an ORM. Knowing the right tool for the job is critical and Objects and ORMs are not infrequently used when they are not needed.
In my work, updates are rarely atomic and business logic is complicated and intricate. It is extremely hard know what data you will actually need and so it makes sense to pass around a complicated object that has all the potentially needed state. This also gives me the option to separate logic about when to commit/rollback from logic about what to persist.
> because every transform becomes an opportunity to write a method on an object and not a straightforward query.
For me, this is a plus, not a minus :). Methods are easier to test and re-use as part of a complicated business logic flows. They make it easier for me to manage when data gets synced with the DB without having to duplicate code.
> —and they change the way you think,
I am going to pay more attention to this and see where I may have made mistaken presumptions and used objects unnecessarily when I could use atomic updates or queries instead.
Everyone thinks about things in their own way I suppose, but perhaps a way to parse it could be to think about whether you're approaching data from a "load and store" mentality or a "truth and snapshot" mentality.
In my mind, unless you wrap the entire programming round trip in a transaction, all data sitting in variables are a snapshot of the past and thus stale by definition.
Ok, great, I’ve only experienced bad ORM’s then, but if every ORM I’ve ever experienced, across multiple teams and companies, are all bad, then what are the chances I’ll ever get to work with a good one? And if I always have to use bad ones, what’s the point in using them at all, when I get by just fine without them?
> What ORM do people use that influences the structure of their database?
Hibernate and JPA encourage designing your domain classes first and then generate the DDL from that.
> Also, the ORM allows me to specify models that are not only used for structuring the database, but also for validation of incoming JSON requests and easily serialize queries back to JSON.
Postgres has great JSON support, does the ORM something with JSON that Postgres cannot do?
> Hibernate and JPA encourage designing your domain classes first and then generate the DDL from that.
This is a feature, not requirement or need of the ORM. It seems pretty silly to let the existence of a feature prevent you from making designing the structure of your DB correctly.
> does the ORM something with JSON that Postgres cannot do?
Postgres's json functionality is used for manipulating and querying data stored in the DB.
I believe the poster is talking about deserializing and validating json from REST requests and serializing json for REST responses using the mapping defined for the ORM.
> I believe the poster is talking about deserializing and validating json from REST requests and serializing json for REST responses using the mapping defined for the ORM.
These are also things that the json functionality of Postgres can do. For example, look at to_json and json_agg.
In my current company I use Postgres JSONB with Hibernate extensively. One of the benefits of the ORM is that json fields can be constrained to a fixed schema. For example, a tag list field can be a SortedSet of strings.
A couple other things I've learned:
* Never re-use complex types in both your API and your schema. These things evolve at different paces and you should never have to worry that a change to your schema will break an API (or vice-versa). The minimal extra typing to have dedicated API types is well worth it.
* Storing untrusted client-submitted JSON in your database is a terrible idea. This is a great attack surface, either by DOSing your system with large blobs or by guessing keys that might have meaning in the future.
I don't feel that a ORM is better than this personally. I know exactly what this is doing at all times. No magic, no guess work about the philosophy of the software. This is probably faster as well.
I'm going to give you an equivalent example using JAX-RS and Hibernate:
@Path("/things/{thingId}/tags")
public class ThingTagsResource {
@PUT
@Transactional
public Thing setTags(final @PathParm("thingId") long thingId, final SortedSet<String> tags) {
final Thing thing = dao().load(Thing.class, thingId);
thing.setTags(tags);
return thing;
}
}
I think this code hews much closer to the programmer's intention, providing essential input validation with minimal boilerplate. It's also comparatively easy to test.
Sorry, maybe I wasn't being clear. I don't just validate that the request is JSON, I validate that the fields in the JSON are valid fields to send over the wire. I do this by automatically hooking my ORM models into my request validation. If a request doesn't specify a column that is NOT-NULL, for example, it will automatically send an error response telling the client that it needs to specify that column in the request JSON.
Python doesn't have type safety to begin so those sort of checks have less utility to me and since json.loads returns a dictionary and python objects are effectively dictionaries you are pretty much done.
Type annotations were recently introduced as a standard library feature as well though they aren't enforced and require a third party tool like you are describing. I've found that 90% of the time you don't actually care what type something actually is and instead you care what it needs to be.
Python doesn't force type safety but there benefits of being type safe to ensure things stay valid/bugs are caught/defense in depth for malicious requests.
For example if I'm going to use a value in a query, because I'm using parameterized queries the type conversion to string happens implicitly so type doesn't actually matter. If I get 2 or '2' it all ends up as '2' and the database infers type by the column type.
If I need something to be a integer and I don't trust the upstream system then you have to:
int(*number*)
At the end of the day if my JSON is going back to JavaScript I can't trust types either so I have to take the same precautions.
I gotcha. I work on a lot of user-facing stuff, so if the type is somehow wrong, I generally prefer to let the user know, which is a lot easier with models.
That is sensible and is generally the approach I take in user facing code. When I have my way I have validators on all input fields in the UI to warn users of invalid input. I like them to be very specific, checking to see if a phone number is formatted in such a way that it is usable by Twillo for example.
On the other hand services that aren't actively utilizing data I write to be fairly agnostic about that data. "be conservative in what you do, be liberal in what you accept from others." is sort of how I aim.
Is it ActiveRecord? I find it to be very flexible and convenient for 95% of my use cases. There are a few places where we drop to raw SQL but AR makes that very easy.
Programmer without anything in particular to prove: hey, I already know SQL, and ORMs create queries "under the hood" in ways I can reason about and control, so I'm going to use this ORM in a way that doesn't involve just throwing objects and tables together in a big pile and mooshing them all together with duct tape.
One of the biggest reasons why I don't use ORMs is because I try to avoid using objects at all unless there's a really good reason to do so. And when I forget to follow that principle it has always turned out to be a mistake; de-objectifying has consistently resulted in simpler, shorter code with fewer data bugs.
My working principle is to have data spend as little time as possible being thrown around within application code. I tend to find that the longer data spends being sieved through layers and tossed around inside your application, the more data bugs you'll end up having.
And when it comes time to display data to the user, it's rarely inconvenient to write an SQL query that fetches exactly what you want to display in exactly the right format and exactly the right order—obviating the need to have any "objects" that "understand" your data model.
The problem is that far too few programmers realise how deep the SQL rabbit hole goes; it's treated like a little side-hustle like regular expressions, when for so many programmers it's the most valuable skill to level up.
If the object reasoning inherent in ORM design isn’t influencing your structure, either your structures are trivial or you don’t know how to use the full capabilities of your engine yet.
... or both. Both is always a possibility. Welcome to databases.
If the object part is influencing the relational part, either the ORM designer hasn't provided sufficient 'mapper' features or you haven't found them yet.
I agree that most ORMs are shit, and if you want to make specific complaints, I'll probably agree with most of them.
But if the choice of ORM is forcing you to design your database to its limitations, you should really be asking yourself whether it's time to switch to a different ORM.
Most perfectly viable database schemas in the real world are trivial by your definition of trivial. Trivial designs aren’t necessarily bad designs; sometimes quite the opposite.
I don't disagree with anything you've said. Though I might raise a very minor objection to the unspoken implication that the "most" database schemas which are currently trivial should be trivial. You're right that trivial designs are very often preferable. But I would hasten to add that when the application calls for data correctness, a bias towards triviality can occasionally manifest as a trade-off between complexity on the application side and complexity on the database engine side.
(As for the condescension, I agree with that too. It was aimed squarely at the GP in the marginal hope that he gets to experience his own tone mirrored back at himself. It might just offer him some insights into perspective.)
Personally, I find that when I run into a problem while programming and ask myself "is this library being stupid or am I?" it's rather useful to remember that "both" is always a possibility.
Of course, it's the internet, so dry british cynicism and condescension aren't as trivially distinguishable as one might hope. Sorry my tone didn't come across correctly.
Until you have a grid with a filter in your UI where the user can create a dozen different queries on the fly....
Why does your database structure have to be structured by your ORM?
The only ORM that I have used are LINQ based ones and they can model any database relationship.
Don’t get me wrong, my first instinct when starting a project is to use Dapper - a Micro ORM written by Stack Overflow that just maps a sql query result to object and doesn’t generate sql.
That's pretty much exactly how I feel. I gave a talk a few years ago about doing advanced SQL things in ActiveRecord [0]. The talk suffered I think from lacking a unifying idea and because I tried to offer something to the whole spectrum of experience (from "this is a join" to "this is how you can express a CTE/lateral join/window function in AR"), but the real unifying motivation/message was that with AR you can have your cake and eat it too.
I've approached tons of problems by making it work in SQL first and then translating it into AR afterwards (to some degree or other). I would reject an ORM without an "escape hatch", but AR is wonderful for taking away the boilerplate while still letting you write SQL when you need it, and even letting you make SQL more composable by defining scopes.
I'm just wrapping up a couple C# projects where everything is direct SQL, and oh man is it verbose and painstaking! Every day I long for more Rails work. :-)
As a note - I've done a lot of SQL in C#, and Stack Overflow's Dapper[0] library is _fantastic_ for dealing with that. You can pass in all the SQL you like, or a stored procedure, and it deals with virtually all of the query parameterization and boilerplate around getting results out of the DB. It's saved me hours of tedium and typing.
It's the perfect micro-ORM. Eliminates boilerplate, gets out of the way otherwise.
AR is the gold standard. AR + extended gem + Arel covers almost 100% of the problem space. For everything else, use views or set returning functions and wrap them up in AR.
There are less drastic ways to reduce SQL boilerplate, though - such as simple SQL dialect implementations, in your language of choice. I'd almost always prefer that kind of interface, when there's a decent one to use.
Of course, your database may come with features that can help too (views, udfs, etc)
I'd generally only reach for an ORM in one circumstance - when my team already knows it well, and can move fast with it. It should also be popular, so that its likely new team members already know it, and can move fast with it. Otherwise, you're just putting unnecessary obstacles in front of your team, in most cases.
The phrase you may be searching for is 'informed consumer'.
The world is full of tools that experts use because they understand what the tool is doing, and why this is a good thing... most of the time.
Having the tool is a poor substitute for the knowledge that led you to use the tool instead of doing it by hand. There are times where you would not do a thing by hand and so you don't ask the tool to do it, and there are times you ask the tool to do something you would never do by hand.
There's a sensible middle ground here, although I agree with the thrust of the message because using an ORM doesn't obviate the need to learn SQL - something I think a lot of developers forget.
The other extreme from using an ORM "for everything" is using SQL "for everything", either via loads of handwritten ad hoc SQL, or stored procedures, UDFs, views, or a mix of all of these. This is just a different nightmare. And don't be fooled: it really is still a nightmare.
A sensible approach blends use of an ORM with handwritten SQL where needed. In fact most ORMs will allow you to do things like build collections of objects from custom SQL anyway, so there's really no need to shy away from it.
One other thing I'd say: I wouldn't necessarily trust my ORM to adequately design my database for me via a code first approach. It's more work but thinking about the data model and explicitly designing the database often yields better results, and you have more control. Code first is OK for simple stuff, but often even simple stuff becomes complex over time so I tend to shy away from it.
> A sensible approach blends use of an ORM with handwritten SQL where needed. In fact most ORMs will allow you to do things like build collections of objects from custom SQL anyway, so there's really no need to shy away from it.
This is really where it's at. Seriously people. Nobody should be writing raw SQL.
Give me just enough of an ORM/abstraction to give me type-safety, leave the rest at the door.
> Are you advocating that I put a bunch of magic strings in my codebase? That would be a maintenance nightmare.
Not a bunch, just the large batch queries and reports that use complicated joins. There a places where it makes sense to take the trade-off between maintainability and performance.
> Unless you think I'm suggesting that no devs should ever write SQL at all in their career? Which isn't the case.
It sure appears to be what you are suggesting. Perhaps you should clarify what you were trying to say?
In my opinion this whole problem is one of the strongest condemnations of OOP. Even if OOP really were a great way to model your data, to express relationships and property types and such (setting aside all the questionability of that claim), it's all just going to end up in a database anyway. Unless you're using a denormalized database like Mongo, or the bulk of your application state is non-persistent, your perfectly thought-out OOP model is at best going to be a terrible approximation of the real data model and you're much better off using a language/paradigm that lets you just treat data as data.
> Even if OOP really were a great way to model your data, to express relationships and property types and such, it's all just going to end up in a database anyway.
Of course. But when you operate on it. When you end user is typing into a screen. When you're validating the contents. You're doing in the application, not in the database. RAM is where the data lives when it's being acted on.
It's like saying your Word processor should have no internal data structures different from how the file stored on disk.
OOP is a great way to model data sotred in RAM and operated on. And an ORM is a great tool for persisting that structure to disk as needed.
Everything you can build with OOP can be done with FP and vice-versa. So the question is which one's strengths you'll most benefit from.
OOP's strengths surround modeling nested object structures and encapsulating state and logic within opaque containers. If you're primarily writing data to/reading data from a relational database, everything gets flattened and exposed. So you have to ask yourself what you're getting from it at that point.
Rendering and processing are well-suited to FP. "Email sending" and "notifications" are vague, but I see nothing about SQL + FP that makes those harder.
Listing prior discussions is not a reprimand suggesting we can't discuss it again. It's a service to the community to let us see what's already been said, if we so desire.
No sides as afar as I am concerned. I just pick the best tool for the job. I love ORMs. I love plain SQL, be that via an ORM "jail break" or a lower layer.
Swapping from EF6 to Dapper was one of the best choices we ever made with our project stack. It is so relieving to be able to hand-tune queries and transactions now. Initially, we were sold on the apparent simplicity of EF6, but as with many things there is a cost for an abstraction like this. In our case, the performance penalties and opaqueness were dealbreakers after a while. We saw an average speedup of 10x on all SQL operations after rewriting using direct SQL with Dapper handling mapping of result sets into models.
Writing your own SQL migrations is actually incredibly straightforward if you just think for a few minutes about how you would do it if ORMs didn't exist. Many database systems have ways to store metadata like schema versions outside the scope of any table structure, so you can leverage these really easily in your scripts/migration logic. EF6 uses an explicit migration table which I was never really a huge fan of.
One thing we did do that dramatically eased the pain of writing SQL was to use JSON serialization for containing most of our complex, rapidly-shifting business models, and storing those alongside a metadata row for each instance. Our project would be absolutely infeasible for us today if it weren't for this one little trick. Deserializing a JSON blob to/from a column into/from a model containing 1000+ properties in complex nested hierarchies is infinitely faster than trying to build up a query that would accomplish the same if explicit database columns existed for each property across the many tables.
If I was starting from scratch I would probably use an micro ORM like dapper or in my case I have experience with Petapoco.
There are so many "gotchas" to using EF that I'm not sure if was worth using to begin with (for a complex project at least). 8 years into developing and maintaining a huge ecommerce platform built on EF I think if I started over I would probably not use EF.
Care to elaborate on which aspect noted above is adverse from your perspective? I would be happy to provide more context and explain in more detail some of the reasoning involved in our decisions.
> Deserializing a JSON blob to/from a column into/from a model containing 1000+ properties in complex nested hierarchies
> complex, rapidly-shifting business models
Your business logic classes have 1000+ properties. And you plan to not migrate them when the schema changes but leave many instances with old versions of the schema sitting in the datastore. Your application logic is going to get nasty!
The other aspect here is that the lifetime of these objects is very brief for our application. Typically 10-60 minutes. Schema changes, while breaking, are tolerable along business cycle boundaries.
Some commenters have asked what kind of business would deploy this kind of technique.
I’ve done it. For MLS syncing software, there are lots of properties, not thousands but can be hundreds. And each MLS RETS has its own Schema, so for any kind of logic portability this is necessary.
Actually, storing the raw data as a blob is a flexibility technique and is a separate concern than the number of fields. As I can’t predict the future set of optimized queries I’ll need, and I don’t want to constantly sync and resync (some MLS will rate limit you), then this way I can store the raw data once, and parse plus update my tables/indexes very quickly.
I didn't quite understand fully how your solution worked in the end, are you storing the entire object graph as a JSON blob alongside the relational data in the table, or are you simply storing the JSON blob instead of using relational data?
Its difficult for me to picture how Dapper even comes into play when you're doing this trick with the JSON blob.
Why not use NoSQL?
Also 1000+ properties on an object? I know some domains sometimes surface these kind of extreme cases, but is there not an alternative to having 1000 properties in one object?
For certain abstractions, we are looking to maintain the state of a business process over time:
The relational concern is the storage of metadata sufficient to locate & retrieve the state. E.g.: integer primary key, name of process, current transition in process, some datetime info, active session id, last user, etc.
The 'non-relational' portion is simply a final 'Json' column per row that contains the actual serialized state. This state model can vary wildly depending on the particular process in play, so by having it serialized we can get a lot of reuse potential out of our solution.
In terms of the models, it's not 1 gigantic class with 1000 properties. Its more like a set of 100+ related models with 5-30 properties each.
Ah yup, that makes a lot more sense. I can imagine how fun that might be trying to do through EF.
It sounds like a workflow engine. I'm picturing one table that is very generic that tracks "this job id, this workflow type, this stage in workflow, this entity, this state of the entity" and a single job has multiple of those entries over the lifetime of that job execution and the JSON blob is the current state of things, so that you don't have to go and recompute that.
Yeah, it seems like a reasonable choice.
What sticks out to me in a scenario like that is a good CQRS implementation. The write side of it pumps in the history of the job execution, then denormalizers run to project that into a shape amenable to being read by the application.
I use Entity Framework for C# and I have grown to appreciate it. I get libraries for in-memory databases which makes it easy to write thorough unit tests, the `Include` function uses join to include foreign key objects in an optimal manner, and scaffolding tools make it easy to map from SQL to C# classes. The resulting SQL from the Linq expressions is logged which makes it easy to see what is going on, if you already know SQL. The only problem is that I think the writeback to the database of manipulated or new objects is not clear but that might just be because I haven't read the documentation for that thoroughly.
EF is fantastic to use. I think it's because while other ORMs try to integrate their logic into a language, EF was developed alongside linq, IQueryable stuff and language extensions. Basically, they changed the language to better accomodate stuff like EF. And it shows.
Advocating for the use of SQL over an ORM in every case is like advocating for the use of Assembly over C in every case.
In both cases, one is a higher level abstraction over the lower level capabilities, which can provide a quite large gain in usability and ability to easily understand what is going on at the level you are working at, for the loss of hand optimizing at a low level to get just what you want in every case.
Similarly to with Assembly and C, you can often drop to the lower level as needed for speed or other very specific needs.
In both cases a good understanding of the lower level language will help you both know when it's appropriate to drop to a lower level for performance or for a special feature, and when it doesn't matter because the ops/SQL generated is rear optimal anyways or the gains are almost definitely less than the problems caused from a maintainability perspective.
I'm perfectly happy to use an ORM for 95% of my DB needs. Just the query builders that generally ship with them are worth the price of their inclusion IMO (at least for the good ones), as it can greatly simplify queries that are variable based on different parameters you may have per run.
> Advocating for the use of SQL over an ORM in every case is like advocating for the use of Assembly over C in every case.
Comparing SQL to Assembly is probably one of the worst comparisons you can make. SQL is a very high-level and powerful declarative language that I would compare, if at all, with functional languages and not with assembly. A few lines of SQL are counting, sorting, grouping, aggregating, and merging data that would take dozens of lines if written row by row in a procedural language.
Yeah, basically an ORM is like a leaky abstraction over an already high abstraction and that's where the problem lies.
A better example would be having someone write C macros for you to implement your logic in. Like you can't call `open` to a get a file handle, you need to use the `WRITE_TO_ABSOLUTE_FILEPATH(path, content)` macro every time you wish to write something to a file because someone thought
"you know, C is complicated and what are we gonna do, rewrite all uses of open every time we might want to use a different flag".
Comes with its own limitations while being sufficient for _some_ people's use cases.
an analogy is not about comparing, it's about communicating an idea.
pointing out that the comparison is not perfect does not invalidate or refute anything. Every analogy, by definition, is not perfect. If the analogy were to ever become perfect, it would stop being an analogy and would instead be a tautology.
"an ORM is like an ORM".
IOW, the differences are WHY analogies are useful.
The point was not that the comparison was not perfect, but that it was utterly wrong. SQL is not at a lower level of abstraction than procedural code, but at a higher level.
Abstraction should make complex things easier, not harder.
What a weird statement. In many ways, SQL is a higher-level abstraction than an ORM. A better analogy might be functional versus imperative styles, but even that breaks down pretty quickly.
ORMs generally serve some subset of three purposes: 1) constrain the dynamic nature and expressiveness of SQL in such a way that it can work well in less expressive languages 2) serve as a bridge between a typed language and an untyped language and 3) a way to avoid your team needing to learn and work in two languages for backend dev (similar to arguments of using Node so that your frontend and backend can be the same language).
To claim one is superior to the other without context is just a very inexperienced statement to make. I'd highly encourage any team that is comfortable being a polyglot to consider avoiding ORMs and embracing the DB as much as possible.
> 3) a way to avoid your team needing to learn and work in two languages for backend dev (similar to arguments of using Node so that your frontend and backend can be the same language)
It's not about learning or working in multiple languages, it's about duplicating the specification of important details.
The holy grail would allow me to write critical business rules in one place, then make use of them everywhere. I want to only have to specify once that `email` is required on `contact`, and have that manifest as `NOT NULL` on the database, an API validation, and a little star next to the field on the UI. Unfortunately, there doesn't yet seem to be a solution to that problem.
Well, there is the option of writing all your CRUD methods as stored procedures which are the things that actually create/update records (while still allowing complex reporting queries), but that means writing all that logic in the provided SQL language.
That was one of the most interesting things to me about the Drizzle MySQL fork project. One of the goals was to make the SQL language that was used a plugin, so you could use just about whatever you want, JavaScript, Perl, Python, Ruby, Haskell, whatever, as the SQL server level. Being able to share client and server level code opens up some interesting possibilities. Alas, I think it ended up being abandoned.
I live in a world where tables have 10^12 rows and all the queries need to be manually optimized. In my world, an ORM is the most useless thing in the world. Different people, different needs. However, in all my projects SQL was more useful than ORM, except where the data models were so simple my mom could write the code for it.
If you have one table that you’re doing a query on, what types of optimizations are you doing to the query as opposed to the database - or even if you’re doing joins?
In general for one table an index helps, but we have almost no queries on a single table. The Query Profiler is telling what part of the query takes the most (CPU, disk reads) and that is the starting point. Tuning a single query can take hours, depending how slow it is and how often it will run.
There is some information on Internet on query optimizations, look it up. There are also server side optimizations, specific to the RDBMS you use and the flexibility it has.
Are you really doing OLTP transactions on a table with millions of rows that require complex searches or are you doing OLAP style reporting?
I don’t think anyone would suggest using an ORM for reporting, aggregating, etc. I would even go so far as suggesting using a different type of database/table structure - a columnar store and send data to a reporting database. A reporting query over millions of rows wouldn’t be real time anyway.
Just keeping the lights on in manufacturing plants with thousands of sensors and counters monitoring the production. Just data collection is useless if you don't put it on good use and that requires some aggregation and correlation to tell what is going on on the floor.
My only infrastructure/data analytics experience outside of standard RDMS is with AWS so anytime I mention an AWS service feel free to substitute a non AWS/open source alternative...
For aggregation and reporting my usual go to would be Redshift - a custom AWS version of Postgres that uses columnar storage instead of rows and/or something like Kinesis (aka similar to Kafka) for streaming data and real-time aggregation while the data is being captured.
Its really not though. C is just sugary Assembly that abstracts some things but the design patterns are aligned. ORMs promote a fundamental mistake of coupling your object model to your normalized data model.
Great comment. Whereas typical object models have many relationships and out of these, nested data, the optimization for SQL lies essentially in the other direction. Your DB should be optimized for speed, your code for use and readability. These are concepts, IMO, that are a classic trade-off against one another and it doesn't make sense to combine them into a single abstraction.
The basic idea of the relational model was actually data independence, i.e. optimization for flexibility. If you are interested you can have a look at the original paper from 1970:
https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf
Your assembly/c analogy is more akin to comparing a shovel to a mini excavator. 95% of the time you’re better off using a query builder library for the convenience rather than locking your architecture into using an ORM that will, inevitably, cause long term headaches.
For the other 5% of the time, you’re just building a todo app, use whatever fancy general purpose libraries are hot at the moment.
The typical language people use is imperative language where you state how to do something, while sql is a declarative language where you say what you want.
Since we are in the world of analogies, using an ORM is like taking a shovel and using it as a prop (without speaking) to explain excavator operator where to dig, how deep, how wide, what things to avoid etc. Except querying a database can be much more complicated. You might be successful with some simple tasks, but it will fall flat on more advanced things.
Anyway perhaps if databases would expose an interface and allowed you to directly write a query plan then that wouldn't be so bad? But then many people would complain that you need to know a lot ot use it, and also based on data your plan for the same data might need to be different to be fast.
Used to do this by replacing slow queries with a dozen joins with stored procedures,insert/update/delete triggers, materialized views.
The mindset was correctness and validity of a detailed and interrelated collection of known facts, at rest on disk.
Merging 2 of these, say when an insurance Corp buys a competitor, took detailed and painstaking effort by people with both the domain and systems knowledge.
It's very different now,with Json doc stores, where in 5 years time noone will know what timezone that happened in, or if this person is that person with different name because of lossy utf8->ASCII .
OK, I was thinking the shovel gets you closer to the dirt and you can ‘manually’ choose where to specifically dig, or not dig.
Whereas, the mini-ex is a complex hydraulic machine requiring regular maintenance and an operator that knows how to use the controls, and a fuel source. Lots of dependencies.
There was a youtube video of a group of men using shovels cooperatively to ‘throw’ cement:
Well, LINQ is special, it is a language that is similar to SQL that's inside of another language. Such code then is translated to SQL. The language is still not SQL but SQL-like and is incomplete, for example you can't do CTE in it.
Everyone should know sql, but choose an ORM (micro-ORM) that doesn't have any abstractions, and gets data in and out. There are many solutions that offer you this without needing to write any SQL.
Ok, what kind of models are you working on?
For me your comparison is completely unconceivable.
I don’t like SQL so much, but I’m fairly certain at this point that the various ORMs are supposed to be an higher level abstraction but are in reality only an imperative facade over the mostly functional SQL language.
I use a data mapping ORM and I've literally not have any of these concerns. They are all non-issues.
The first item about querying is relevant but all ORMs allow you to drop into SQL to execute a complex reporting-style query. It's not really necessary if you are just querying in objects to manipulate (which is what ORMs are good for).
This is the 'data mapper' vs 'active record' debate and I would argue both are sub-categories of 'Object-relational mapper.' I've seen multiple people in this thread even include query builders in their definition of ORM. I'm starting to wonder if much of the disagreement in this thread is caused by everyone having their own definition of ORM.
I'm okay with ORMs, but I can't stand lazy evaluation. It often leads to situations where it's entirely unclear when the program is making a database call.
For example, given the small django program:
dbcall = models.Purchases.objects.filter(amount = 100)
if dbcall:
do_something()
if len(dbcall) > 5:
do_something_else()
third_thing(dbcall[0])
Does the above app make 1, 2 or 3 database requests? There is an answer, but it's not at all clear to the developer.
>There is an answer, but it's not at all clear to the developer.
A cursory reading of the documentation is usually a first step in acclimating to an otherwise unfamiliar system.
It just so happens, for your example here, a complete explanation[1] can be found at the very top of what's likely the most vital subsystem's documentation. During due diligence, this information will be among the first encounters.
> A cursory reading of the [django] documentation is usually a first step in acclimating to an otherwise unfamiliar system. It just so happens, for your [lazy evaluation] example here, a complete explanation can be found at the very top ...
The "very top" of the django lazy evaluation documentation you pointed to hardly sheds much light on the issue. The documentation just says that certain operations cause database queries, and that some are cached. It's not clear whether doing two similar operations consecutively will result in two database queries (e.g., checking len(dbquery) twice, or checking for truthiness then pulling the first object from a database query).
The django documentation does differentiate between evaluated and non-evaluated database queries, but to the developer, it's the same object. Would be much better if an "unevaluated query" and an "evaluated query" were two different types with different semantics.
Worse, these lazy evaluation semantics are not terribly pythonic. Checking the length or truthiness of a string, list, set, dict, or tuple is an O(1) operation. Checking the length of a database query may (or may not) make network calls, and may have O(N) performance or worse.
Databases queries can give rise to performance bottlenecks and race conditions, and hiding what's actually happening from the developer is a recipe for all sorts of problems.
If you use Go and PostgreSQL, I’ve been working on a tool to help you “just use SQL” called sqlc[0]. It generates methods and structs for your queries, automatically. It’s made a dramatic difference in my day-to-day workflow.
I really like this! I particularly like that it is oriented around a useful workflow, rather than for just "small code at rest".
It looks like it parses the schema to find type information (2 minute skim, so please forgive me if I got it wrong!) Hoe does it handle schema changes?
I think there's often an underlying confusion between different tools and what they are supposed to do.
An ORM - as the acronym says - is helpful to map database records to objects in the system.
The meaning of the acronym already says that an ORM is not really designed for scenarios like aggregations and reporting. Within those contexts, you don't normally reason in terms of list of "objects" and "relationships" between them.
A "SQL builder" gives you a nice programming interface to build and manipulate SQL statements. Manually building complicated SQL strings is tedious, error prone and it makes it hard to reuse the same queries. With a SQL builder instead you can easily add dynamic conditions, joins etc, based on the logic of your application. Think of building a filterable Rest API that needs to support custom fields and operators passed through the URL querystring: concatenating strings would be hard to scale in terms of complexity.
Some people prefers to use templates instead of SQL builder to add conditions, dynamic values, select fields etc. I personally find that this approach is like a crippled version of a proper SQL builder interface. I prefer to use the expressiveness of a real programming language instead of some (awkward?) template engine syntax.
I think the confusion between these two different tools is caused by the fact that in some popular frameworks as Django or Rails you just get to use the ORM, even if behind the scenes the ORM uses some internal query builder.
Other ORMs like SQLAlchemy instead gives you both tools. You can indeed use SQLAlchemy as a ORM and you can also use it directly as a SQL builder when the ORM abstraction doesn't really work.
Normally, if someone tells me that it's better to write SQL queries by concatenating strings, I'd ask them how they'd build a webpage that filters products in the catalog with a series of filters specified by the user (by price, by title, by reviews, sorting, etc.). Try and build that concatenating raw SQL bits, without making a huge mess.
Also, the "just learn SQL" may apply to ORMs, but certainly not to a SQL builder.
Query builders can be so good at making it easy to work with the database ... the popularity of ORMs over query builders is a really big collective reasoning failure in my opinion.
With a good query builder in hand - it is very unclear to me why anyone would ever want to use an orm.
I like query builders but if everything were done with query builders there'd still be an awful lot of DRY pertaining to business logic that needs to go somewhere. Maybe you replace the ORM with some sort of 'results act as <something>' abstraction, but things like that often work out much better with consistent scaffolding of some sort, which is mostly what successful ORMs seem to be.
Surprised to see no one has mentioned query builders (like http://knexjs.org - no affiliation). All the niceties of not writing raw queries but no abstraction leaks... Although I have found for anything more interesting (recursive queries, etc) there is no escaping raw queries and every developer needs to bite the bullet and learn SQL.
I see query builders as a learning tool for fresh developers, and as a job aid for business analysts. For some this can be the final destination, but for any developer I would push hard to get them writing SQL by hand ASAP.
A few weeks of suffering through DIY SQL is really the only way to fundamentally understand how the database is working for (or against) you. Once you learn it, it really does become like a second language. You can fade in and out of proficiency based on recency of exposure, but it is mostly like riding a bicycle. One other advantage is you can take it across any SQL platform without a second thought. Many ORMs have provider-specific compatibility woes to contend with.
I disagree. Knex isn't an abstraction over SQL itself, as much as an abstraction over the tedious process of writing literal strings, which let's the developer treat SQL queries as code.
My process is generally - write SQL to find the best query, then convert the SQL to Knex for integration.
As a simple example, say you are making a simple query to the db, but in some edge cases you need another column to be selected. Knex lets you do this:
On a recent project, it was a weird inversion in terms of access... in order to keep the middle tier thin, and meet requirements that all data access happen through stored procedures... we pretty much standardized an interface with one input parameter (@json) and two output parameters (@result, @errorResult). In the end, all input/output was JSON and the database handled all data internally.
I don't really like it much, but it did make the API service super thin, and the "database guys" were happy to do it all in SQL. Of course, testing that beast isn't so fun, and there be dragons...
In the end, I tend to favor creating APIs with scripted languages that require less translation to work with the database side, and structure responses to match the expected data layouts for the API side. With node, I usually create/use a simple abstraction...
var result = await db.query`SELECT ... WHERE foo=${bar}`;
or
var result = await db.exec('sprocname', {...});
In either case, not really a need for a formal ORM here.
Yes, but I'm not sure I get the benefit of decoupling the schema, I'm still coupling to the stored procedure interfaces, and still have to deal with the shape of input and results.
If schema needs to be changed in many cases sp interface will stay the same e.g. I can do changes/optimizations to schema without changing the calling code.
But, if it's a schema change where you have to update the SP, you still have to change code... it's that the code is in (PL/T)SQL vs in another language.. You still have to update code either way.
the sprocname approach gives you an option to only grant your app user execute permissions on sps and thats it. (much tighter from security standpoint)
I would have generally agreed with this sentiment until we took a leap of faith and chose Entity Framework 4-5 years ago. That, coupled with the power of LINQ has allowed us to do things we simply wouldn’t be able to do otherwise. Pass IQueryables around without realizing them right away. This leads to functional, composable queries. And the queries we generate would simply not be possible to be written by hand. If we need to write SQL, we can and do. But with good design and good indexes, we rarely, if ever, do. I was skeptical, but EF is a tool we have not regretted.
The problem on the article is because the author seems to be using ORM as a complete replacement of SQL query. When it's used like that, for sure it's gonna be a hindrance.
How I use ORM is to help automating some of the things, while on top of that, I'd still write raw queries for the more complex things. For example, I create base tables using ORM, and then lay off all the joins and more complexity in a handwritten views. For query, I use ORM for something simple such as direct single-table queries or straight forward joins. When things get more complex such as joining a bunch of tables, or doing nested queries, I just use raw query.
The thing about ORM is that it should always be used as a complementary tool. Feel free to mix and match it with raw SQL queries as needed. Trying to do everything in ORM is just too much. One would end up spending too much time learning about the inner working of the ORM, instead of getting things done. So yeah, use it as a complement to raw queries, not as a substitution.
I’ve come to the same conclusion even working with ActiveRecord. SQL is very literally a domain-specific language for working with relational data; why would we go so far out of our way to eschew writing code in it?
Because the process of building that SQL, executing it, and bringing the data it returns into the object oriented universe is a tedious and fiddly pain in the arse.
1. “Building SQL”? It’s not clear to me what that means in a world where you are writing raw SQL rather than using ORM’s and query builders.
2. “Executing it”—this is called a SQL client library. If you install your SQL statements as stored procedures you can execute those procedures by name, for example. Not that hard.
3. “Bringing the data it returns into the object oriented universe”: I am typically satisfied with an array of hash tables. Funny story: what do you get when you use ActiveRecord’s find_by_sql with a query that returns columns not in the original table? You get a collection of objects each with the extra column monkey-patched to the individual object. In other words, a slower and stupider hash table that happens to have dot syntax rather than bracket syntax and a bunch of do-they-even-still-work-in-this-context instance methods attached.
I'm a SQL fan as well, having used it before there were ORMs. I still prefer it for complex queries. However, ActiveRecord is so nice for reducing start-up tedium with DDL and migrations, and especially for mapping relations to data structures (objects in this case).
> SQL is very literally a domain-specific language for working with relational data; why would we go so far out of our way to eschew writing code in it?
It is, and all ORMs I know use it under-the-hood. It is the boilerplate code by which SQL is issued to the DB, and results parsed into something usable by the rest of the program that people try to avoid writing over and over.
I used ORMs JPA/Hibernate in several projects/teams and the outcome is always the same: things always get messy and overcomplicated, few reasons :
- they push developers to design super-normalized db schemas that look beautiful on paper but are horrible in practice
- the average developer has a very superficial knowledge of how ORMs work and this often leads to bad code/performance
- soon or later you will find yourself fighting the "framework" because what you are trying to do does not fit their model (e.g: upsert)
In my experience, this whole idea of abstracting from the DB is faulty at its root. You want your code to be close to the DB so that you can use all the greatest and latest functionalities without waiting for the framework X to support it.
I have found that JOOQ or simply Spring JdbcTemplate in most cases are more than enough.
Generally ORMs try to abstract away the database entirely. This is mostly fine for CRUD stuff where you really just want to persistently stash away something basic off host. If you could write perfect uncrashing programs you'd probably just keep them in memory.
As soon as you need to find things, especially based on their relationships, you'll need to be aware of what indexes you have available at least. Eventually you'll want to have control of the precise query
JOOQ lets you compose queries programmatically without really hindering you from producing any query you want but not forcing you to glue strings together either.
First your tree needs an apple, then it needs a pear, and then a fig, and after a year or two of grafting other things on you end up wishing you'd just created a cornucopia with a bunch of columns instead of joining a endless fruit because two thirds of the time you need a mix of the fruit anyway.
I'm definitely guilty of over-normalizing at least some of the time, but I'm less convinced that it's related to ORMs so much as it is to frequently having to build things without knowing how related they'll end up being in the future.
We used to show the new developers on the team a query generated by Entity Framework...23 pages long. We replaced it with a well formatted query that fit on half a page. When I switched the team to Java + JPA, we started writing our own queries if wasn't a simple CRUD and it has been wonderful for the entire team.
I think it's easier to learn SQL than learn an abstraction of a DSL.
And SQL has been relevant for about half a century, odds are good you'll continue to find use for that knowledge for the remainder of your career.. Well worth the time investment.
I don't know. I've been building database backed stuff for 25 odd years now, and I've never experienced this Object/Relational Impedance Mismatch that everybody talks about in any of my designs. I sometimes wonder if it's just the approach I take that has ended up dodging that bullet somehow.
My initial design is always done in the database. Whether it's a little feature or a green field new project on a blank sheet of paper, that sheet of paper is the Schema Designer of my db (or a schema.sql if I'm in postge/mysql land).
Once the schema is nailed down, the object structure flows out easily. You can follow foreign keys, many-to-many tables and non-identity primary keys to figure out your children, relations, and inheritance. It's so well defined that for my own stuff I just point my code generator at it to get a pile of base classes and stored procedures for all the CRUD. (And re-run it at build time to ensure that everything always matches up.)
So when it comes to pulling down records, modifying them, saving them, grabbing sets of them to spin through, etc. There's never any mismatch because what you get from the db will naturally look just like what you need. Because you designed it to be that way.
I may hazard a guess as to why so many people do run into issues, and it's because I notice that nearly every ORM I've seen in the wild expects you to define your schema someplace other than the database. It'll have some wacky XML config file that you're supposed to keep up to date, with even wackier "migrations" for when that changes. And it'll then either build your db schema for you or expect you to have something in place that matches what it wants.
But that's silly.
There's already a perfectly good place to keep your schema. In the database.
And I guess it follows that if you don't design your system with a sensible relational data model in mind, you might find that your object structure doesn't in fact fit in a database correctly. Could it be that that's what people are describing when they talk about Impedance Mismatch?
"impedance mismatch" I believe typically refers to the following:
* the fact that the records you can create with SQL are an unbounded combination of every field in every table; in addition, any aggregate or functions applied as well as any renamed fields will further enrich the set of row classes you are able to generate.
* the fact that with (standard) SQL you cannot create anything other than lists of records, whereas objects are directed cyclic graphs
* the fact that objects assume unbounded access to the said directed cyclic graph as if it were in memory, which is a mismatch with optimal SQL querying patterns (the n+1 queries problem)
Having migrations for updating the schema makes sense even if the schema is raw sql and the migrations also is that. What it does is that it enables having multiple environments without the headache of having to keep them in synch.
Change the schema, and run a data migration script if necessary. And, of course, check the change script into source control so that your build system knows to run it on the staging and prod dbs when the time comes.
Then simply build your project and fix any compile-time errors that arrived when the base classes were all blown away and rewritten.
Extra points for keeping your column names in string enumerations so you can't ever get runtime errors from typos or having renamed a column. (all handled by the code generator, of course)
But yeah, even on my mature projects I still find myself changing the schema all the time. It's as easy to do as adding/changing a class in the project itself.
I believe you've just described migrations (scripts that modify the schema and run data migrations, checked into source control).
Generating the ORM layer from the database schema seems OK to me. It does preclude generalizing certain subsets of the schema like e.g. lets say you want to publish a "comments module" reusable across projects that would install its own subset of tables in the DB, as well as provide functions (procedures) to create new tables that link comments to other entities on demand.
The problem with relational databases is that SQL has such poor facilities for abstractions. Whereas the typical language today has higher order functions (some even have higher order classes!) stored procedures are quite limited.
Code thinks in objects and functions and values and pointers.
Databases think in tables and columns and rows and queries and indexes.
If you don't pick an ORM to help manage this translation layer, then you'll end up re-implementing your own. Maybe this is OK, because yours will be simpler for quite some time.
What else are you going to do? Stored procedures? Concatenated strings?
No, you don't end up re-implementing your own, you end up implementing 10% of one. The 10% you need. Maybe it maps rows to objects, but doesn't handle object identity right, because that's not what you need.
The ideal for me is something that manages connections, lets me write the SQL, and gives me easy access to ResultSets, perhaps in the form of an object.
And yes, stored procedures are extremely useful, especially for minimizing chatter between the server and the database. Use them appropriately. (Although I don't see how they are an alternative to ORMs.)
Experience tells me you really need two layers. One for translating databases to objects, second one to translate those objects into proper domain objects.
The way code thinks about objects, behaviors and relationships is unlike the way you need to have it stored. Trying to use ORM-generated objects directly in a more complex business logic is, I learned, a recipe for disaster.
And since you're already writing two layers - data translation layer and actual domain layer - then, one may wonder, why not skip the first layer and implement the domain layer in terms of SQL queries?
I always feel that the arguments against ORMs is like the one against frameworks or using “Vanilla JS”. Saying you don’t use one really means you’re just writing your own.
I don't want to lose minutes or hours debugging a stupid typo.
> * I don’t test much
Writing meaningful tests with my object model without having to care about bizarre SQL semantics.
> * I don’t care how much time and bandwidth it uses
Because ORM will optimize the queries for me, cache results and perform lazy loading when possible.
> * I only know Java (or C#, or whatever)
Using the full power of expressive languages to work with my object models, keep things DRY and reuse code, and have support for different storage engines and sql flavors.
My personal gripe about ORMs is that they have two main usage patterns, and each of them has major drawbacks.
The first pattern, common in frameworks like Django, is to cram the business logic into the ORM instance objects. This creates a tight coupling between the two separate concerns (business logic and data persistence), and will cause problems as soon as the two structures deviate from each other.
The second pattern is to keep the ORM as a purely data storage layer, which is much more flexible but commonly causes duplication of nearly identical structures in separate layers.
I learned SQL at school. When I was hired and suggested to use an ORM, the boss didn't know what an ORM was, so he said no.
I have no problem with SQL, I actually like the language it's very powerful. However mapping query results to objects is soul crushing.
So a few year ago when I became the sole dev in the company, I decided I would try Entity Framework (Microsoft .net ORM)on a new project that needed to be delivered yesterday.
To get even more speed I didn't even design the database as I do usually, I went with the code first approach. It was magical, so much less work to do.
Then something went wrong with Entity Framework it didn't retrieve the right data I think (sorry I don't remember what exactly, but it was a show stopper) and I couldn't find a solution to my problem online.
Then I had to do everything by hand in a rush, I was very late with the project.
I kind of lost interest in ORMs after that but I thought there should be a way to map queries to objects.
So I started to develop my own micro ORM with Tuples. I went nowhere fast but while I was searching I discovered that what I wanted to do already existed and was actually named a micro ORM.
Now I use Dapper (Stackoverflow .net micro ORM) and I'm very satisfied with it. Its level of magic is relatively low, so if it doesn't work I can quickly replace it with a manual query without losing much time.
It amazes me how much consensus there is now about ORMs being an anti-pattern. I had figured this out over a decade ago; back then ORMs were so common and popular that even the most senior developers thought I was a fool.
There was literally no choice; I had to work for several companies which were using ORMs. I dealt with the reality of the industry by specializing on the front end.
Those were the dark ages for back end development.
Nowadays front end development has taken the lead when it comes to insanity; React, Babel, GraphQL, Webpack, CoffeeScript and TypeScript... All tools/frameworks which add negative long term value.
I switched back again to back end development a few years ago to escape the front end madness... But now back end development is also starting to degrade; pure functional programming and languages which transpile-to-JavaScript (to run on Node.js) are the latest diseases. Thankfully back end development is more fragmented and there is more room for different tools.
If this continues, I will have to give up on software development and switch to consulting; I will be forced to sell complex (but popular) tools which create problems for businesses and then offer to sell them a solution to solve some of the problems which I created. It's not a joke though, business people really are becoming THAT retarded.
I mostly agree. But lately I've been using Ecto, and I like the approach. It's mostly an SQL generator. So far it's the sanest approach to ORM I've used.
ORMs are a leaky abstraction -- there is a sweet spot (mostly basic queries/relationships) where they provide lots of value, but there is a veritable infinity of scenarios beyond where they are a hindrance. Also they tend to introduce lots of hidden complexity when they try to introduce flexibility to match the landscape.
I'd argue that most of the time what you really need is a query builder with the right combinators, and maybe some higher order combinators to wrap those together for even better ergonomics.
For those writing NodeJS one of the best ORMs that I've found is typeorm[0], check it out.
Here are some reasons I like it:
- Typescript-first
- Provides the repository pattern (ex. repository.findOne) & the entity manager pattern (ex. manager.findOne) so you can use whichever you prefer
- Annotation-based entity markup
- Ability to drop down to raw SQL query easily at any time
- Query building ability, special handling for relation queries, it's even got support for building with where params like and and or
- Great documentation
- Ability to programmatically use the introspection that it does for you (accessing the metadata from code is trivial)
- Handles migrations (you can generate them, run them, etc), easy to use programmatically as well
- Supports some low level concepts from my own favorite RDBMS, postgres
it lets you create beautiful sql like declarations
```
payment_types = from(s in Ecto.assoc(order_cycle, :splits))
|> join(:left, [s], p in assoc(s, :payments))
|> select([s, p], %{
type: p.type,
card_type: p.card_type
})
|> Repo.all()
```
its the first database library that makes queries easy to compose together. On top of that transactions are a snap to put together thanks to Ecto.Multi
Codd's relational model on its own is a more beautiful, flexible, and concise system of data representation than the object oriented model. SQL is a rather poor implementation of it, but it will do, in most instances, for the sake of compromise.
What it is missing is a model of _behaviour_.
I've always (ok well, not always, but for the last 15 years probably?) felt the missing piece in what we call 'middleware' isn't to resolve the impedance between object orientation and the relational model, but to somehow come up with a method of representation behaviour/execution for the relational model itself. And do away with all these hierarchies of classes and whatnot. Something conceptually like what we call a method/multimethod in OO, but consistent with the mathematical, set-based lingo of the relational algebra.
But I'm probably not smart enough to figure that out. I hope someone does. I don't think 'stored procedures' is even close to it. Maybe Datalog has it, but I haven't grokked it well enough to say.
One of the biggest problems with ORMs is that they encourage unstructured SQL access. What queries does the application run? With an ORM, it's impossible to know, because relation-fronting objects are available everywhere in the code, and that means any module could be composing or adjusting queries.
A logical thing to do is restrict these kinds of actions to a database module, which exposes a function for each query we want to run against the database. This is a lot like the stored procedure model, with the procedures in the app (and adjustable from the app) instead of in the database. With a structure like this, it isn't so bad to write all queries as SQL files with some template parameters. Maybe there is repeated logic but there is usually a SQL templater for your language that lets you template in table names. There is abstraction, there is clarity about what code is being run, there is control over performance, there is syntax highlighting.
We used a third party tool for mailers and it was having severe performance problems after moving the server to a cloud based solution.
We confirmed what the DBAs said, the latency was only 34 seconds round trip and the database was pretty fast. I installed wireshark to see what was going on. It was querying a table with 80,000 entries, and then for each entry it would do another, "select ... limit 1" query to get the meta data on each record. This loop was abstracted away in an ORM and the third party had very few clients with tables of our size for that particular table.
When it was run on prem it wouldn't take too long, but when it was on cloud it would take 45 minutes. Insignificant amounts of time being spent thousands of times ends up being significant time spent, and this can creep up on you as more things move to be serverless / API driven. A bunch of money and time was spent on figuring out the source of the issue.
I'm kind of curious about what data the select limit 1 queries were returning that the ORM couldn't get in a single query and had to go back per record. Can you shed any light on this?
ORMs have a degree of flexibility. It's possible to write n+1 queries accidentally, especially for developers who are new to the ORM. It's also often possible to address those issues. Sometimes trivially, sometimes not.
The other thing that comes to mind here is that maybe the query was hitting a non-covering index and that was triggering the lookups? In that case a covering index would have fixed the issue?
Maybe it's because the problem isn't directly caused by ORMs but just a very poor usage of ORMs. The same problem would have existed if a loop was written to perform the same query.
Sure you can make arguments one way or another regarding if a hand-written block of SQL would have the same flaw, but if an experienced DBA or developer writes it, I would bet on their output way over anything an ORM outputs.
If you consider the software development process as a whole, the explicit SQL approach intrinsically guarantees additional scrutiny of the actual SQL statements. If you hide all of this behind an ORM, all that is seen during code review time is some beautiful POCO model with a few extra IEnumerables thrown on it. No one is paying attention to the man behind the curtain and the horrible join that was just created automagically.
Perhaps the answer is to just log and profile all the ORM generated SQL - Sure, but if you could look at the ORM's output (which in some cases is abusively large) and quickly determine if its good or not, why not just write it yourself and be 100% sure from the start?
You literally need the same piece of knowledge to to avoid that n+1 problem in both Raw SQL and ORMs: you have to use a join.
Developers have been making n+1 mistakes with Raw SQL for years before ORMs became popular.
It doesn't matter if the "DBA or experienced developer" makes a select query that is better than the ORM. If this select query is inside a loop then all bets are off already.
I've followed ORM like/hate flamewars as long as I've been in the industry, and I think I'll be following them for as long as I continue. It's really an interesting and ultimately irresolvable tug of war between abstraction, automation, and a number of other issues.
I think people who have used ORMs a long time do not see them as a SQL replacement or as a total database abstraction layer, but as an automation tool for CRUD operations, with some capabilities of doing interesting things with querying, and potentially a methodology for managing database schema and migrations (depending on the tool). At their best, ORMs can tremendously reduce one's requirements for unit tests in particular areas because they have enough structural metadata to typecheck (automatically or compiler time) all the way down to forms in interfaces.
BUT, without a doubt, ORMs are indeed a tremendously leaky abstraction. That said, I would argue that every data store interface is, in that sense, a leaky abstraction. No matter the datastore you're using, be it RDBMS or a specialized NoSQL / search store, you need to learn the hows and whys of how it is structured. Not only should a professional learn SQL, but they should get at least a layperson's knowledge of how the database underneath the SQL works. However, the argument still stands, because when debugging a complex ORM query, you are debugging how it puts SQL together, and then you need to debug what the generated SQL is doing. So you're now multiple layers away from the actual thing you're managing.
Hence the 'final form' of my day-to-day ORM stance: I like to have an ORM around because of the massive automation around entity manipulation, but I think of the ORM in terms of the database and how the database should be structured, rather than as a generic domain model that happens to be mapped to some behind-the-scenes database. Furthermore, I believe it is quite valid to drop down to SQL for the very complicated stuff, especially if you want to trust to the query planning of the database. Once you do, you've broken the total safety of the abstraction, hence my ultimately seeing ORMs as automation tools rather than abstraction interfaces. I have no blame for people who refuse to utilize them, though I'd argue in a design meeting for their use, and hope I wouldn't have to take on all the CRUD queries if I lost the argument, because I'd start writing code to generate them, which then would become a terribly under-engineered faux-ORM :).
"I'm seriously you guys." I was recently working with a Prolog-SQLite binding when it suddenly dawned on me that I should just skip SQL and use the Prolog engine directly as the DB. Modern Prolog systems can handle millions of records, for a lot of applications that's all you need. On the other hand they also have e.g. ODBC bindings.
SWI Prolog can internalize data in flatfiles
"Managing external tables for SWI-Prolog"
SQL just isn't composable. I know this article is old, but these days it's not black and white. In the space between ORM and raw SQL there are things like AREL which can save a ton of dev effort without the "impedance mismatch".
Composing in this context means that one part of your application (eg the list controller) builds part of a query (the select from) and another part (eg the filter controller) builds another part of the query (the where part) and yet another part of you application (the paginator) alters there where part, adds limits and offset and build another query based on the same filter conditions to calculate the total row count. All that without the different controllers knowing each others in advance. This is not possible without complicated string manipulation or building some not-sql-query-algebra.
Some api like prepared statements (supported by the db itself) for building up complex queries step by step would be nice. Does something like this exist?
As the other answer to you says: subqueries will do that. DBs automagically unnesting subqueries has been a thing for a long time, so usually you won't get any performance impact (With a big warning for correlated subqueries).
Neat, not having used rails ever, I didn't know about Arel. Speaking about composable sql, I've had some good success with dbt lately. It has a feature where it lets you 'compile' a query out of multiple select models. Now I wonder whether it'd be feasible to use that dynamically, within the context of an application...
> My contention with ORMs is that, if you need to know SQL, just use SQL since it prevents the need to know how non-SQL gets translated to SQL.
I feel the same way about Wiki syntax as ORMs.
While SQLAlchemy is great for getting the DDL done, my SQL-fu is such that I wind up just using SQLAlchemy as a glorified connection manager while I construct the SQL strings directly rather than muddy up a sophisticated query by translating it into Python.
The same holds true for Wiki syntax. If I'm already proficient at HTML, it buys me precious little to use this 'easier' syntax if sometimes I'm in Redmine, sometimes I'm in Confluence, or wherever else I land.
In terms of wiki syntax, HTML is just so damned noisy that I'm fine with Markdown--but mostly because Markdown is fairly standardized and I don't have to learn a totally different syntax for everything.
In these cases, I’ve elected to write queries using a templating system and describe the tables using the ORM. I get the convenience of an application level description of the table with direct use of SQL. It’s a lot less trouble than anything else I’ve used so far.
I was with the spirit of the article save for this.
Recently I've been developing some work for one of my client's in the .NET world. There had been ongoing discussions about developers wanting to use Entity Framework as an ORM vs. using stored procedures.
The client already uses SQL projects and DACPACs (effectively a system for specifying the DB in SQL then diffing it between versions to alter the database).
The arguments for and against ORM were largely naive from each side of the fence - the developers wanted strong typing (doesn't require an ORM) while the DBAs wanted to be able to review query execution and suggest changes if there's an issue (you can see the generated SQL for the ORM).
The solution I came up with was to use reflection on the build server as part of the CD process to map the inputs and outputs of the stored procedures into strongly-typed C# methods and objects, generate code for it, and build/pack/push a nuget package back to our feed. It's similar to what EF provides without maintaining the EDMX (which we can do as we don't need to cover every possible data access scenario like EF does). It means that once the database project is checked and the build green-lights, an updated Nuget package that constitutes the DAL is automatically waiting on the internal Nuget feed.
I've found it gives us the best of both worlds. We can do anything we need to in t-SQL and the C# wrapper only cares about the ultimate input/output. I can force the use of parameter sanitization in the wrapper (by not providing any other way to call the procs), and the DBAs can review/amend whatever they want without the developers needing to change their code as the interfaces don't break.
We also don't have to write DAL boilerplate or worry about inexperienced developers getting it wrong and opening injection attack surfaces at the DAL layer.
There's usually a solution to your use case if you look for it is my point.
One or two blog articles featured here, clearly explaining the difference in philosophy/architecture, and the advantages it has over ORMs, and you will have an audience.
This is a great article, I've really enjoyed reading it! I want to react to a few of the points made in the post, and point out some potential techniques for solving them!
Most ORMs are heavily inspired by object-oriented programming which fosters the object-relational impedance mismatch. In addition to logic for storing and retrieving data, models often also implement business logic (ActiveRecord is a good example here). This makes for bloated and complex objects that are difficult to work with in the application. A solution can be to lower the level of abstraction and use a more lightweight query builder (like knex.js for Node.js). These kind of tools give you more control in constructing your queries as well as the ability to optimize them.
These tools still require you to understand quite a bit of SQL though and the productivity leap compared to writing manual SQL isn't as high. I believe that query builders are the best compromise we have today for accessing a database from an application.
Regarding the dual schema dangers that are mentioned by the author, I strongly believe that these can be alleviated using code generation tooling that helps to keep your database in sync with your application models (approaches like SQLBoiler in Go where application code is generated based on the database schema are an example here).
In my experience, though, the cases where ORM's seem to be a good fit end up growing to cases where they're no longer a good fit - and getting away from them ends up hurting more than just starting without one to begin with.
You have to distinguish between heavyweight ORMs that impose their own data model and lightweight ORMs which just provide transparent query builders and static typing for query calls from your given programming language.
The former is frustrating to use in my experience but the latter tends to make SQL easier to work with without taking away any of the power and expressiveness of SQL.
Composible ORMs (for me, AR+Arel, this was about 4 yrs back) let me write some of the most magical, legible query code I have ever seen. I could create and reason about subqueries with highly dynamic filter options and pass them around and wrap them with outer queries, all in. perfectly readable code. Glorious.
Of course, for complex analysis like in specific periodic data reports, where the filter parameters are mostly known and don’t change constantly with the model, there are diminishing returns to this.
However, when you start writing code that writes other code using string concatenation, big-picture wise I think you are doing it wrong. Look at HTML and how it has developed towards client heavy apps as other example. Encoding is hard, dangerous stuff, and having a library do it for you (like the w3c DOM APIs, or higher abstractions like react) can be invaluable and can make the difference between spaghetti code and gorgeously understandable functional statements.
I think a lot of this is just some weird expectations. You cannot _use_ a database without knowing sql. ORMs work great, but you have to know sql to use them properly. What if you want to load a relationship with a join? What if you want only to load certain columns?
I would consider part of the profession to be knowing sql. This whole "orms are bad / long live the orm" split attitude is ridiculous. ORMs are fine. Some orms are bad, just like some code is bad, and some frameworks are poorly thought out. I have worked with developers in the past, who I _desparately_ wish were forced to use a good ORM like ActiveRecord, so that they could understand just how far you can get with a solid and good pattern. I've also worked with developers in the past who used an ORM like a 15kg sledgehammer, and had absolutely no idea what was going on.
Have any of the naysayers used rails' ActiveRecord? I used to not like ORMs much, but honestly it's a whole different ballgame. Very easy to slip into SQL for bits (or big pieces) as well. People don't really complain about in the rails world (my respect only increases) and it has a lot of miles.
In my prior job, I worked with a few (really skilled) interns that were just finishing up their CS degrees. I found out that they didn't learn any SQL at all and their interactions with databases was severely limited overall. This really surprised me, as I had a whole databases course when I went through college (different school, but in the same state, just 20 years earlier). Is not teaching SQL or relational database concepts no longer common at colleges? I should mention that the school they went to otherwise was excellent and taught a great many things my own schooling never touched on and I had to pick them up later myself. I'm just curious if it's a matter of just that school, or if other courses have been prioritized across the academic field generally for some reason.
I've looked at ORMS mostly on the periphery for years and years now, but haven't really been able to use them in our environment and this might be similar to the situation for others as well.
One thing I noticed early on, is that even using a query builder complicated things a bit as well since it meant that the original SQL string was then broken up into multiple method calls to construct the SQL string. Since the full query wasn't built until the very end, this meant that a helper line needed to be added into the code to get that full SQL query if any issues were found down the road and then taken over into our program of choice (in this case, SQL Developer, since we're dealing with Oracle queries in most cases) and running our additional tests over there.
Our campus ERP is pretty complicated table-wise, so our queries (developed either by myself or by our systems analysts) tend to be fairly complex, requiring multiple joins and other complex logic that I feel would have a very difficult time being translated into an ORM format.
A query builder is still somewhat usable, but for the most part I just stick to mostly straight up SQL queries and make use of prepared statements to help avoid SQL injection and keep life simple so it's easier to move back and forth between the application side and testing the query on the database side :-).
On a side note, I'm not sure if it's mentioned here in the discussion (and it might be less of an issue now than in years past) but I have noted some ORM usage doesn't always choose the most efficient mechanism for things (e.g. pulling results using small, but relatively expensive multiple SQL queries rather than being contextually savvy enough to know that a set-based query would be better suited to retrieve the entire set of results at once).
Overall though, working with SQL and coming up with solutions for things is probably one of the funner aspects of my current position and while using an ORM has sounded like fun in the past, it just hasn't seemed like it's the best fit for our particular workflow/environment.
Rather than having a full ORM, a SQL system that didn't put in variables by embedding strings would be useful. You can almost do this. SQL has variables, but they're more persistent than needed for this.
SELECT a,b,c FROM tab WHERE a=@mysearchkey;
An API should look something like
result = sql->command("SELECT a,b,c FROM tab WHERE a=@mysearchkey;",
{"@mysearchkey": val })
and the result should be a key/value form. In some languages, you might get a
typed structure back. No more string escapes. No more forgetting the string escapes.
If you required that the command had to be a constant, SQL injection attacks would be a thing of the past.
Almost every single native RDBMS API provides parameterized queries / prepared statements where the parameters are sent separately from the query text. Here's one from MySQL:
String escapes should have been dead a few decades ago -- I don't think any modern platform requires it; they all support parameterized queries natively.
> I’ve found that reflection techniques to get the data definition are not worth it and I succumb to managing the redundancy of data definitons in two places.
Not sure if much has changed regarding the "reflection techniques" since 2014, but I think Postgres does a fine job reflecting. (Don't know much about others)
For example my favorite library Massive.js (a data mapper) depends completely on reflection. It allows its users to access tables, views, functions, extensions, and even enum types from its Javascript API, without the need for models. This completely solved the data definition redundancy problem for me.
I even made a small layer on top of it to get the constraint information too using the information_schema, and everything is working like a charm.
As a refugee of Hibernate etc. I definitely agree with this. Python in particular already has data types that map nicely to relational data - dict, list, tuple, iterators - so mixing SQL calls directly in code works great.
Plus you get full access to the particular database's features, direct control & understanding of what's being executed when, easier query & performance tuning, no special 2nd pseudo-sql dialect to learn, no big extra stack of leaky abstractions to troubleshoot, etc.
One of the big pitches of ORMs is "you can change your DBMS mid project!" Which sounds cool and has its occasional applications but is something I've never actually needed in over 20 years of development.
I worked in some projects where we had to support multiple databases. Entity Framework and a few Database Views helped us keep it agnostic. It was very common in the pre-cloud enterprise world.
Uber did it recently, too. They changed from Postgres to MySQL. But I don't know if ORMs helped them or not.
There is friction here because it’s at the meeting of two languages instead of implementation-interface, or implementation-service. People either try to hide the underlying SQL entirely on the one pole, or attempt to embed a SQL-like DSL within the primary language. I think that modern programming language ought to offer first class support for embedding other languages or DSLs - an idea behind Language Oriented Programming.
SQL offers the domain appropriate syntax, while the “host” language allows access to in-scope variables, functions, etc.
Of course there would be some more work in allowing a “mix” of two syntaxes. Another option is a query language as a subset of the language, sort of like C# and Linq.
"...They are an acceptable way to represent a data definition, but a poor way to write queries and a bad way to store object state." Rails perspective: Pump your brakes there, friend. This claim depends completely on your application's needs. I write web apps for research that don't need to scale and don't need lighting performance. I have built apps that ended up needing some optimized-SQL-versus-ActiveRecord experimentation, and sometimes, writing raw queries is certainly worth it. AR has been evolving for 15 years, is extremely useful, and all that is needed for many use cases. We're not all trying to do the same thing.
I feel like there should be a better term to distinguish an ORM framework from the kind of ORM that pretty much every competent developer uses (a hand-rolled one).
These “you don’t need an ORM!!!” posts always annoy me because even if you don’t think you’re using an ORM, you probably are. You aren’t mixing raw SQL statements into your business logic, are you? Probably not. You’re probably writing wrapper classes that “map” the “relational” data into the “objects” that your business logic uses, otherwise known as an ORM.
But there’s a big difference between that and an ORM framework that generates SQL. Realistically speaking, you _do_ need an ORM in all cases, but not necessarily an ORM framework.
I agree. Unfortunately each database has its own SQL dialect with small differences, so migrating to a different database becomes a problem when using plain SQL.
You also lose compile-time checks.
I'm trying to combine the best of the two approaches in the V language. It has a built-in ORM that uses SQL-like syntax:
uk_customers := db.select from Customer where country == 'uk' && nr_orders > 0
println(uk_customers.len)
for customer in uk_customers {
println('id: $customer.id; name: $customer.name')
}
The solution is to use something closer to a compile-time SQL or DSL, such as jOOQ or SqlAlchemy (sans the ORM pieces). These are terrific technologies that improve upon SQLs weaknesses and achieve some level of portability.
The "ORM or not" discussion quickly runs off track, because there are so many different ORMs with different features and working on different levels of abstraction.
Recently I have most experience with EF Core (from .net) which does not have the "select * " issue the article describe (you can select whole entities, but you can also project individual columns if you want.) It uses Linq for expressing queries, which means it is more concise end express relational algebra clearer than SQL.
On the other hand I have tried Hibernate, which had such a verbose and cumbersome query builder syntax that it made you long for plain SQL.
I've found that DDD and a read/write separation of concerns simplifies things greatly.
On the write side, the ORM just returns an aggregate, usually based on the PK of the root. Thats's trivial for any ORM.
On the read side, simple queries can be modelled with ORM syntax if you're just trying to fetch a graph of existing objects. Complex queries can be returned with raw SQL that map to custom read models. I tend to wrap both styles in integration tests that ensure the query logic doesn't change, and that the actual mappings don't break.
Both ORMs and SQL have their usages, and they're not mutually exclusive.
> In these cases, I’ve elected to write queries using a templating system and describe the tables using the ORM. I get the convenience of an application level description of the table with direct use of SQL. It’s a lot less trouble than anything else I’ve used so far.
This feels like good balance. I want to express my database schema via OO classes. It eases db migrations as the application grows if you use tools like Alembic (same developers as SQLAlchemy).
But use care to avoid SQL injection risk with template queries. SQLAlchemy makes this easy using .bindparams, as does .NET via SqlCommand.
ORMs can be nice if you're using the database for a single application in a very conventional CRUD pattern, but that's about it. And that's if you have the luxury of building a new DB to go with your new App. If you find yourself doing anything interesting (ie not just CRUDing single rows by ID) you should not only learn SQL, you really need to become familiar with databases. The promise of ORMs (at least when I first encountered them) was "you're not going to have to do SQL" but in reality, you will.
I'm old enough to remember when pretty much every business app developer knew SQL. ORMs didn't arrive because people didn't want to learn SQL, they arrived because there was a mismatch between 'relational database' and the emerging trend to 'put your business logic in an OO-powered independent middleware layer'. Unless you want to code all your logic into stored procedures and whatnot, you end up needing some sort of abstraction of the database structure that can manage business rules for you.
Me thinks someone needs to have a go at maintaining a 2Mloc accounting package built using only embedded SQL statements and stored procedures, including migrating the whole mess between major database vendors.
I guess one advantage is you have to learn, it but I really prefer some kind of ORM for more mundane repetitive CRUD. More to get a structured (ha!) interface between the database and the application than for the convenience.
I do wish most of them would stop insisting on putting the cart in front of the horse and make code the primary representation.
I've found that you need to first understand basic SQL very well. Then once you've mastered the ORM, it can dramatically improve your productivity. But you need to understand both to reason about what you are doing.
ORMs have their place. Simple CRUD microservice? An ORM can help tremendously. Complex reporting system? Probably not the right tool.
Be careful though. I've run into issues where once you're scaled way up and need the ORM to get out of the way, it can be a beast to detangle if you weren't disciplined.
I work a lot with SQL, not at all ORM's. (I'm on the data extraction, modeling, analysis side). So what's the counter argument here, where to ORM's excel?
The counter-argument is that for a lot of applications you have a bunch of tables and a bunch of objects and there's a lot of repetitive, boring code to write just to get simple CRUD working. The hobby Java ORM tool I wrote (http://hrorm.org) only does simple CRUD. Everything else it leaves to you using JDBC and SQL, since the problems being discussed here are real. But for managing persistence of simple object models, I think ORMs are a big time saver.
The way I'd put it is: "learn SQL then use an ORM". If you understand exactly what SQL is being generated and what the implications are, you can use an ORM effectively and benefit from it in many ways (readability, security, composability, etc.). You'll also know when to use it and when it's better to just drop down into SQL. It's when people don't really know what the ORM is doing underneath that they get in trouble.
And therein lies the problem. These ORMs (by and large) have become so complex and abstract that it's nearly impossible to understand what they're doing underneath (I'm looking at you Entity Framework (in .NET))
You wind up having to have a deep understanding of the ORM and SQL, at which point I would argue why bother with the ORM at all? For .NET, I'm much happier with a very very thin layer over the base .NET database client library called Dapper. Unfortunately most shops use Entity Framework by default.
"You wind up having to have a deep understanding of the ORM and SQL, at which point I would argue why bother with the ORM at all?"
I agree that ORMs are often too complicated, but imho it's still worth it to learn and use one. I'm using ORM loosely though in that I don't think heavy 'Object' and 'Mapping' layers are so important. Mainly you just need a reasonable way to parameterize and compose queries so you can avoid injection and deduplicate logic in a sane way. So I guess my argument is more "use a library" than "use an ORM".
In my experience everyone who says they'll use pure SQL ends up adding gnarly string building logic at some point because the duplication gets out of control. It's better to just find a decent lib that works at whatever level of abstraction you're comfortable with and use that.
Writing SQL is a lot of work, not type safe (in the programming language static typing sense), often not re-usable, and hard to test. There are libraries that solve these parts for you (or at least help with them). For me the optimum is libraries that do only these parts but actively try to NOT introduce any new layer of abstraction over the database model itself. The problem is that in this whole ORM discussion, the baby gets thrown out with the bathwater.
I’ve had success in Java with Freemarker. It’s a templating engine so you’re just producing strings. But you can add typing; use variables, loops, and conditionals; and include other templates for sub query reuse.
IntelliJ has a plug-in.
It’s a nice compromise between crafting strings vs SQL DSL.
As for the objects, you can get very far with everything being a Map until you really need to add a class or two. :)
Absolutely. One of the things I like about Groovy is that it solves the other half of that equation because it has the built in map constructor and built in SQL APIs that work with it. So you can actually do
class Foo {
Long id
String name
...
}
Foo foo = new Foo(db.firstRow("select * from foo limit 1"))
And it all just works if the database columns match the fields of Foo. And if you just do
Foo foo = new Foo(db.firstRow("select name from foo limit 1"))
Then you get a `foo` with only the name populated, etc, and you got type safety, easy direct, efficient SQL queries and the ability to test your code without hitting the database, all without imposing any "leaky" abstractions that cause all the problems.
ORMs makes it easy to stand up databases and make doing migrations a lot easier. That being said it’s my opinion that after using them in the person space the best ones do the boring stuff but then are best used a as query building tools so you can have some control over what the query looks like and actually does. That being said if done right I prefer thin abstractions over my data layer and raw SQL — I don’t care for the data layer to be a black box
Most modern ORM have escape hatch to let you write raw SQL.
Your favorite web dev language will have a dominant ORM. C#, Python, Elixir, Ruby all have a popular ORM that works within a popular framework.
ORM will make working with database and web framework easier.
I do agree with post that querying in ORM can be hard and sometime not possible without raw SQL. Writing query for Ecto, Elixir's most popular ORM, can be tricky when you want to do dynamic query.
ORM is like a regex. When you use it, you've got two problems. The problem with having two problems is that we love solving problems! Yay OCD! So if an ORM doesn't let us do something, instead of just getting the work done with a couple of quoted SQL strings, we try bending the ORM to our will. I know I've done this with the entity framework. And when we do that, the perpetual debate starts all over again.
Why do I have two problems if I use a regex? "Programming Perl" and "Mastering Regular Expressions" were what drew me into programming in the early 2000s. Whilst ORMs are optional regular expressions are not so I don't get your point.
I think that my primary issue with ORMs is that they often allow you to code as if the database doesn't exist, instead of explicitly structuring code to handle network and database failures at the points where you deal with external data.
It is a problem if potentially any part of your program can just fail, and that happens if you allow magic ORM-objects to leak onto code paths that aren't written to handle them.
In my career, I have used Entity Framework, Active Record, Sequelize.js and Hibernate. I haven't had a problem with any of them. Especially not with postgres and mysql. I've worked in teams of 10+ and personal projects that I've completed in a couple of days. My advice is to only use the best of ORM's. I won't use an ORM unless it's at least 5 years old.
Personally, I feel like a very similar article could have been written called 'What CoffeeScript has taught me: just learn Javascript'.
I have nothing against coffee or typescript (and other alternatives) and think they're very useful, but at the end of the day it's really just javascript. But I guess you could make that argument about anything until you get down to machine code.
For simple cases it is almost always a good decision to use an ORM if you don't know SQL. The question is whether and how long the ORM shoe will continue to fit. Databases are essential for many use cases so the chances are that an ORM will not always work. So decide on that basis when to learn SQL. Maybe not today, maybe not this project.
For anything complex, definitely. For anything simple, eh.
It also feels to me like he's talking about a specific ORM - I know ActiveRecord has it's fair share of issues, but from what I know of AR usage and implementation, it either doesn't do what he's (legitimately!) complaining about or does do it in the way he's suggesting.
Ah, Java. Tooting my own horn a bit, I wrote a custom YAML/XML/JSON parser for C# specifically because the existing parsers required me to turn the data into objects before being able to anything at all, rather than allowing me to interact with the data just as data. Say, cleaning up and/or validating the data.
This is a common problem I've run into in strongly typed languages; can't "just have some data", have to have an object :/
Whether or not you love/hate ORM, I think the advice at the very end is still applicable: "If you’re using an RDBMS, bite the bullet and learn SQL." This knowledge will help you write SQL if that's what you're doing or debug the SQL that the ORM writes for you. Either way ... learning SQL is a good thing!
Using both should be fine.. SQL alone encourages early optimization that might slow down the development, especially when you don't have the complete non-functional requirements. It sounds like a pitfall to compare it against using ORM. The way I see it, ORM stands good when it's understood as an additional capability.
Working with SQL was nicer in PHP. It gets more difficult for some reason in Node/JS. SQL injection would always be a concern by default when writing raw queries. Though I think I could avoid writing flawed queries/endpoints, it's harder to expect that from a team at scale. #foodforthot
ORMs have taught me to not rely on ORMs for critical storage. I have one product that uses ruby DataMapper. With my particular product (embedded industrial equipment) it’s terrible. I have to constantly fight it to correct MySQL problems.
If they had just used basic sql it would be much easier to refactor.
What are some good rule-of-thumbs for when to use ORM vs SQL?
I have worked with SQLAlchemy and Entity Framework, before and like them, but haven’t been able to find that magical demarcation line for when to go raw SQL.
Does anyone have basic rules they use for determining this? Applicable to MVP or enterprise level products
The more joins or when you need more performance, drop into SQL.
SQLAlchemy's expression language is a good middle ground. I will start there if I need to do more than a couple of joins.
As some people have noted here, I think the biggest problem is that people who only know ORMs will have trouble because their understanding of the database will be limited by their lack of SQL knowledge.
An ORM is not a substitute for learning SQL. It's just a nice tool for more tightly integrating your database and your application code. An ORM can make you like much easier, but if you don't know SQL or how a relational database works, your ORM isn't going to save you.
I don't want an orm to keep from learning SQL, I want one to save me from having to write reams of boilerplate code copying fields one at a time from an SQL result to my language objects/structs/etc and then back again when I want to save them to the DB.
When I started web work ~9 years ago, it was with Rails and ActiveRecord, which turns out to be incredibly good for basic queries and pretty basic apps. So good to the point where I never bothered to go too far into SQL until years later, which was a mistake.
When doing work in python, I don't feel it has a comparable ORM, to where I kind of write my own files that have things like finders and updaters and creators. In most cases, I've found it's much better than SQLAlchemy. Dealing with joins, and things like math, meaning averages, sums, distributions, division, is so much better to be done in the query rather than more basic queries and looping through the results.
There are of course cases like injections to look at, but lots of things I have are calculations of data and showing it, so we're able to handle it with raw queries. Also, ActiveRecord has ways to enforce no injections.
In lots of cases, we've found that using an ORM to start, finding slowness, and moving towards raw queries is a great way to go.
Current rails developer here, I've found that using ActiveRecord for basic queries and moving to SQL for more complex stuff to be a very good combination.
I agree that ORM users should learn SQL, but SQL and an ORM are not mutually exclusive.
Yep, also current rails dev here and this is also what I do :)
Learning SQL with an ORM does remind me a bit of learning memory management with a garbage collector: helps you make better choices as to how you put things together, even if you never directly use the know-how.
We use ActiveRecord, which, from my perspective is a good ORM. It's very simple. If you are diligent you can use pluck and count and a ton of other features to ensure you are selecting only what you need. It's easy to drop down to sql in partial form or in the full form. We tend to not get too crazy with AREL.
In particular I like that it behaves way better using surrogate primary keys (serial id column) instead of natural primary keys... it just answers that question for you that would otherwise result in bike shed conversations.
We minimize logic in the AR models, instead those are in glue objects and this pattern works real well. AR models are mostly there for describing relationships in the code and doing data validations in the code on top of CRUD operations.
current rails dev here, i've found ripping out the custom SQL and replacing it with ORM equivalents offers the best performance and readability short of custom Postgres functions. The state of ORM in rails of 2014 vs 2019 is a big leap.
FWIW, I think the sweet spot for most web apps is to use an ORM on top of database views. This gives you the power and performance of SQL, but avoids the impedance mismatch of the ORM, since you’re defining what data will be on your objects at the database level.
ORMs often implement only the lowest common denominator of all supported databases. Easy changing databases is in a free software world seldom a necessary requirement.
If you use a advanced feature rich database often a lot of it is not supported by the ORM.
I don't know.. when starting out or prototyping a new feature I find ORMS useful for moving fast, and the performance good enough.
You can always slowly migrate to pure sql for critical parts of the system.
What happened to object databases? They seemed to have a brief moment in the spotlight during the 90s but haven't made much noise since. Bad product or just too much inertia with relational databases?
Yeah, I don't understand why developers feel they should use one over the other in every single case.
I guess there's a certain appeal to only using nails or only using screws, but at some point it doesn't make sense to compromise the quality of your code over ideological purity.
Using ORM if you don't know SQL is a recipe for disaster. And after you learn SQL, you will likely decide to go back to ORMs, because now you know how to use it efficiently to save your time.
The thang is, what is an object in the context of the relational model has been generally grossly misunderstood. For a detailed analysis, see C. J. Date’s The Third Manifesto.
I pretty much agree... I roll my own ORM usually, though... mostly for niceties like firstOrCreate, and such. I always keep a sense of the actual MySQL being ran behind the scenes.
Can't we all agree ORMs are just leaky wrappers of SQL? Almost all of them need a backdrop workaround to issue raw SQL to the backend DB to avoid being a blocker at some point.
I used SQLAlchemy and would not reach for it again, but you can prise DBIx::Class from my cold dead fingers, so I think _which_ ORM is an important consideration too.
Fun things are coming soon btw, I finally figured out a couple of design problems that have been annoying me for years and the code is close to stable :)
The article gives specific examples of difficulties using an ORM. It would be nice if people advocating ORMs here gave specific examples of how to overcome them.
My experience is that ORM are great for avoiding boilerplate IF you already know SQL and don’t consider them as an abstraction layer but as syntax sugar.
If you put unnecessary abstraction in front of perfectly good solution the you will end up with problems.
Object oriented programing is about clearly expressing business logic. There is no complex business logic in dumping big tables of data.
So, it's not about ORMs, it is object oriented programing that is poor fit for doing reporting.
Relational databases, declarative SQL, functional pure programming are good solutions for reporting.
And you should certainly learn SQL if you want to do any applications with ORM or without. I recommend Joe Celko's SQL for Smarties: Advanced SQL Programming.
Object oriented languages shine when there is a need to create a precise language that facilitates fast and robust communication between domain experts and developers. In situations where you have little data, but a lot of intricate logic. This is where relational databases are simply no good.
Relational algebra and SQL is not a very expressive natural language. SQL limits your vocabulary to 4-5 verbs unless you start writing procedural code in procedures etc. but SQL is not a good procedural language. Relational databases are a solution to specific technical problems of scale, execution speed, atomicity, consistency, isolation, and durability (ACID). They excel at that, not at communicating intention.
You should use ORMs (preferably Data Mapper) if your goal is to solve problem of expressing complex domain specific logic. You use relational databases in that situation because they just work. Data Mapper allows you to isolate your domain model from tricky technical aspects of data storage like indexing and/or not corrupting files during power outage. ORM works very well as long as you will actually be able to ignore technical aspects of speed etc. in your domain model.
You can do the data mapping, querying, migrations, and all this technical cruft manually with a handwritten SQL if you want, but SQL certainly will not address very well the goal of creating an expressive domain model that facilitates robust communication between developers and business experts.
So, given that we addressed the elephant in the room, some other points:
Dual schema dangers: "I much prefer to keep the data definition in the database and read it into the application."
That's perfectly good solution, if you have a lot of data and amount of logic related to data is minimal. If you have little data and a lot of logic you will not be able to store data definition exclusively in database whether you are using ORM or not. Even if you just store SQL queries in your code, then you do store schema structure with SQL, just not explicitly, but implicitly.
Data migrations, rolling release etc. are tricky whatever you do. I use my ORM to dump me SQL that is needed to move structure from point A to point B, since ORM knows the schema it can do that, and then I manually adjust it as needed to massage data etc.
Identities:
Not sure if I follow here. It seems like you have issues with auto incremented ids. Auto incremented ids with ORMs are annoying, indeed. My advice is to use UUID generated in the code, then you will have no need to hit database. Also a side note here: if you use ORMs, do not use anything that has any real world meaning for ids. Use UUID, so that you are sure that no domain expert will want to mess with that.
Transactions:
The same situation as with speed and migrations. Getting technical details of transactions is hard whether you use ORM or not. You can use stored procedures, but I'm curious what will you do e.g. when you will need external REST API to get in sync or to copy user files to assure full transaction from the user perspective. There is no magic bullet, it's just hard.
I have fought with orms for many, many years. I think if you sit behind someone and watch them build an application with an ORM, and you sit behind the same person and watch them use SQL and data access functions and other boring things, the person using the ORM will spend many, many hours fighting to get the ORM to do what they want, or fixing bugs that pop up because the ORM didn't do what they thought it was going to do in subtle ways.
Generally, with threadlocal sessions and an application passing orm data class instances around the code freely (which is by far the most common pattern of use), the application will end up doing 10,000x more queries than the programmer would have guessed (this is a literal number and not an exaggeration). Trying to tell the ORM to preload the tree of objects that is going to be accessed is nearly impossible, since the instances go up and down the stack from function to function, each potentially accessing an attribute of an instance loaded as an attribute of an instance many levels back to the original object intentionally pulled from the database.
ORMs make writing the application 90% faster for the first 2 weeks and then 50% slower from then on.
That doesn't mean you are stuck writing straight SQL queries and passing around rows of data, you can sit for a bit and build data access functions that make your life easier and write classes that represent entities from the data, but without your data going through tens of thousands of lines of (extremely well engineered and thoughtful) ORM code that you have no hope of ever understanding well enough that you will avoid catastrophic mistakes that are extremely hard to fix.
And you will make catastrophic mistakes, mistakes you would probably never make with SQL. SqlAlchemy has 5 states that orm objects can be in. If you are using SqlAlchemy and you can't instantly tell me what those 5 states are, and the detailed description of each, you are already making huge mistakes and corrupting data.
To quote one section from the many pages of SQLAlchemy documentation about 'session state management' (and if you don't know all this stuff by heart you will end up learning a lot of it the hard way):
==============================================
The SELECT statement that’s emitted when an object marked with expire() or loaded with refresh() varies based on several factors, including:
The load of expired attributes is triggered from column-mapped attributes only. While any kind of attribute can be marked as expired, including a relationship() - mapped attribute, accessing an expired relationship() attribute will emit a load only for that attribute, using standard relationship-oriented lazy loading. Column-oriented attributes, even if expired, will not load as part of this operation, and instead will load when any column-oriented attribute is accessed.
relationship()- mapped attributes will not load in response to expired column-based attributes being accessed.
Regarding relationships, refresh() is more restrictive than expire() with regards to attributes that aren’t column-mapped. Calling refresh() and passing a list of names that only includes relationship-mapped attributes will actually raise an error. In any case, non-eager-loading relationship() attributes will not be included in any refresh operation.
relationship() attributes configured as “eager loading” via the lazy parameter will load in the case of refresh(), if either no attribute names are specified, or if their names are included in the list of attributes to be refreshed.
Attributes that are configured as deferred() will not normally load, during either the expired-attribute load or during a refresh. An unloaded attribute that’s deferred() instead loads on its own when directly accessed, or if part of a “group” of deferred attributes where an unloaded attribute in that group is accessed.
For expired attributes that are loaded on access, a joined-inheritance table mapping will emit a SELECT that typically only includes those tables for which unloaded attributes are present. The action here is sophisticated enough to load only the parent or child table, for example, if the subset of columns that were originally expired encompass only one or the other of those tables.
When refresh() is used on a joined-inheritance table mapping, the SELECT emitted will resemble that of when Session.query() is used on the target object’s class. This is typically all those tables that are set up as part of the mapping.
SQL is to much software development what rivets, bolts, or even beams are to the construction of skyscrapers... if you’re going to be in the software business, you should really get comfortable with SQL, after which you’ll likely find the friction and headaches caused by ORMs to be a deal breaker.
> I do know one thing: I won’t fall into the “ORMs make it easy” trap. They are an acceptable way to represent a data definition, but a poor way to write queries and a bad way to store object state. If you’re using an RDBMS, bite the bullet and learn SQL.
I don't think i'm alone in saying I'd rather hire a developer who is native with an ORM and can dive into SQL when things get thorny, than hire one who's going to fight me on the very merits of an ORM at all.
I mean, hey. To each his own. But lets be clear: this is ridiculous, and if you think it's a position you want to take up, make sure you can get hired holding onto it.
I quickly ended up with a lot of duplicated code. So then I thought, "Well ok, I should add a bit of abstraction on top of this..." I started coding some simple functions to help map the tabular data to objects. One thing led to another and suddenly I looked at what I had done and said, "Wait a minute..."