I've come to a couple conclusions, over the years.
First, when you get down to it, the most-valued feature of ORMs is not the "writing queries in some language other than SQL" feature, it's the "not having to write a mess of mapping code" feature.
Second, the biggest drawbacks to ORMs all derive from the "writing queries in some language other than SQL" feature.
Fortunately, there are tools out there that solve the "mapping tables to objects" problem without trying to control all access to the database: So-called lightweight ORMs like Dapper and Sql2o. To me, these hit the sweet spot: They give you most the increase in productivity while steering clear of most the increase in complexity (and, by extension, decrease in maintainability).
The query DSL for ORMs is fine, works well in a large majority of the situations. When it gets problematic, you can almost always drop down to pure SQL.
The mapping part of the ORM on the other hand is a disaster. Table rows make poor OO classes. The best OO classes are "workers" that have some concrete task at hand. "Active record" style classes have no scope whatsoever - anything connected to the data can be potentially added to the model class.
So on projects with ORMs my main problem is dealing with the responsibility spaghetti. Typically one class which is the central point of the domain grows to crazy proportions. In a document-related software for example, you can potentially put everything in a "Document" entity class. "DocumentConverter" on the other hand has clear responsibility scope - to convert documents - but of course, a converter doesn't have a database table.
So don't map DB records to objects, then. DB records are records, and their proper typing in your business logic is as records—chunks of plain old data, strongly-typed, that your (OO or otherwise) code can declare DB-side interfaces against. The only responsibilities of the module/class that owns the record type, should be getting things converted into and out of that record type.
Plain-old-data DB record types (and their respective owner modules) are to ORMs, as code-gen'ed wire record types (and their respective owner modules) are to ProtoBuf-like wire-format codec libraries. They're the containers you put data into and take it out of, to ensure it's in the right types when being sent to, or received from, your DB. No more than that.
And, corollary to this idea: the POD record-type MyDB.FooTable shouldn't be taken as a type to use to stuff in the results from any random query you run against the DB table `foo`. It should be taken to represent the specific row type that the DB table `foo` has—the one you get when you `SELECT * from foo`. If you do an SQL query, and the result of that SQL query has fewer or extra or differently-typed columns, then the resulting row-set will have a different record type, and so you should have a separate POD data-type that does match these records. (Example: if you join two tables with a query, you don't need the ORM to spit out two records objects with a fancy OO-style relationship between them; you just need one record type that represents the resulting joined rows, without any particular idea that it happens to be an amalgamation of data from two tables.)
Would that be a "RRM" (Record-Relational mapper)? If so we're already on board (in a different language).
P.S. I bet you would never guess which typed language with great support for records makes it easy (for the most part) to build most type-safe SQL queries on the fly, even with projections, without explicitly defining types for every possible projection variation.
Lots of Python codebases are on board with that (and I assume codebases in other languages are as well). It's by convention rather than language-level support, but works perfectly well: don't bolt business logic onto your ORM classes' methods, handle those elsewhere in a separate layer whose conventions you define. Works great.
I get that you're talking about Typescript, but I just want to say that Scala excels in this area as well. I found Slick to be a pleasure to work with once I embraced not trying to map directly from a relational model to an object model, but instead to just use Scala's ability to manipulate and reshape the relational model as needed.
Somewhere in one of his talks, Rich Hickey goes on this great mini-rant about, "Why can't we just let data be data?" I wish I could remember exactly which one it is. Maybe "Simple Made Easy"?
Fortunately, there are tools out there that solve the "mapping tables to objects" problem without trying to control all access to the database
...like, say, Hibernate.
I have a reasonably mature Hibernate app. It is almost exactly as easy to type createNativeQuery() as it is to type createQuery(). Many simple queries are shorter in JPQL, but for anything complex I always use SQL. I never feel like Hibernate is keeping me apart from the database.
I genuinely think most people who bash ORMs just haven't gotten far enough in their SQL learning curve that they get tired of writing the same dumb I/O code over and over. ORMs never "freed you from SQL", they just freed you from boilerplate. Even venerable and much-maligned ORMs like Hibernate.
Around 10 years ago, a work collegue improved the ETL processing time of a Java application from minutes to a couple of seconds, by dropping hibernate and using direct JDBC instead.
If all one is doing is calling createNativeQuery() all the time, and having stored procedures as well, then it is just needless fat.
If you can comfortably use JDBC for data access, then your data access logic isn't very complicated. There's nothing wrong with that; use the right tool for the job.
But Hibernate will add extra complexity by it self. You can make mistakes in de mapping, then it adds multiple caches and you might use Spring and use the open-in-view filter...
I am not stating that Hibernate (or JPA for that matter) is bad but it totally depends on what you are developing. Simple solutions can use Hibernate for more complex stuff you might not.
Using JPA can also make your application independent from a specific database. But if you want to use advanced database features and have chosen a specific database you can not always use JPA.
Actually "writing queries in some language other than SQL" which has static typing and catching issues in compile time is quite big for me. Add automatic database migrations that are also keeping types in line with code and whole bunch of "mess of mapping code" goes away. Though I use .NET EntityFramework which by now is really mature and heavily invested into by MS. Not sure how it is with other environments but I think I have it too easy.
I have tied to use automatic migrations and I do not see the point. Either your diff is trivial and it is trivial to write a migration or your diff is complex and you have to care about the lock levels of schema changes, the time complexity of changes and that the migration transforms data correctly. The code generating the automatic migrations needs knowledge about the structure of the production data.
Lack of adequate support for versioning your schema and in general bad support for anything versioning with RDBMS is one of the main dislikes I have for SQL. I find SQL to be a beautifully expressive language when you get the hang of it, but no RDBMS that I know of has been able to adequately tackle the versioning problem at both the schema and the script level. Everyone's schema evolves over time, why is it so difficult for the RDBMS to try to capture the idea of database evolutions? This is the single biggest selling point for me in ORM land, because with regular (read: non-SQL) code you have all the typical version control mechanisms to help you manage changes to the schema. It's still not ideal, but a lot better than trying to manage this kind of thing in any of the tools that mainstream RDBMS have.
Yes I agree. I ended up making a small DSL to describe tables. Then another program to translate that in to a script that checks everything through meta data in the target db (atm it does mysql). I check the table structure files into version control. If I wanted an old version pull that checkout and run the generator script. I end up with a pure SQL script that creates a blank db with bootstrap data or asserts all the columns/indexes exist for that version of the code.
There is another small dsl for bootstrap data, and I just check in all stored procs like normal code.
What this doesn't give you is good change column type semantics but any other tool I've ever used when this happens, can't be trusted anyway you have manage it yourself. This type of change is pretty infrequent though.
For me it works better to assert the schema I want now. Drop columns arn't done due to similar thorny issues that usually a human has to deal with. Or if its simple I put those in a cleanup proc that runs as the last part of the script.
Using normal code to describe structure I guess is similar its just you're fighting the language with meta data extensions, and also whatever you do isn't language agnostic. You're also at the mercy of whatever magic is going to happen when you change schema with that lib.
This is a big thing for me. Most ORMs I’ve used have decent migration tools which is a huge benefit for managing your schema. We enforce changing Schemas via migrations as opposed to any sort of auto-migrations some packages offer. Makes life a ton easier being able to put these changes into a build pipeline.
Migrations (or evolutions, depending on the language you are using) are a joy when implemented properly. Alembic and Play Evolutions are two frameworks that I have dealt with specifically that are pretty great and hard to go back from.
I'm very comfortable with SQL, but I also like the static typing and compile-time safety that linq brings.
Automatic migrations I like less. In particular, Entity Framework and Entity Framework Core migrations.
Firstly, it seems that the second you stray from the most basic of scenarios, thry generate incorrect DDL.
Secondly, I'm just not a fan of 'magic' code gen.
Thirdly, every entity/model change a dev makes does not need to equate to a schema migration - that's how you end up quickly accumulating hundreds of schema migrations.
When I have to use MS SQL Server, the approach I favour is to use CQRS, using EF Core for the majority of cases, then using Dapper when I need more control. And for migrations I like DbUp, because it provides me with full control over when it's time to create a migration file, and what better DSL for migrations than... SQL.
Damn that's a lot of enterprise level words and acronyms to just "store some data". Lol not saying you wrong... just don't like this part of our industry. Is it too late to go back to Turbo Pascal (record-files) ? :)
Not sure I really mentioned much 'enterprise-level', beyond SQL Server?
I guess I'm making it sound more complicated than it really is then, because this approach is actually a lot simpler than the 'N-tier' abstraction-hell of yesteryear.
Not to mention code reuse. Say for instance we have one query that finds eligible bachelors near me and another that finds newly eligible bachelors. We need two queries but they both rely on the same underlying domain concept of "eligible bachelors". In entity framework it's easy to save that as an expression that can be reused in multiple places. It's hard to do that in sql in a maintainable way.
Not to mention refactoring issues. If I decide to change a datetime field called time to split it into a datetime and offset that change is far safer in c# than it is SQL.
> Say for instance we have one query that finds eligible bachelors near me and another that finds newly eligible bachelors. We need two queries but they both rely on the same underlying domain concept of "eligible bachelors". [...] It's hard to do that in sql in a maintainable way.
Seems a textbook case of using a view for sharing query logic.
> If I decide to change a datetime field called time to split it into a datetime and offset that change is far safer in c# than it is SQL.
If you can do it in C# instead of SQL, you must be talking about a change in the interface used by an app (so, with good DB design, a view, not a base table). But this is very safe in SQL, you are changing a view definition and, if its a read/write view, insert and update triggers.
>Seems a textbook case of using a view for sharing query logic.
Views are just terrible outside of data analyst style work. Unmaintainable, restricted to SQL structures, require absurd hacks or custom dlls, not properly source controlled, difficult to perform performance analysis on, hard to update.
You can't combine views easily, you can't cache results, you don't get static type checking.
I'm not sure how that's a problem, since SQL is a fully-general data language.
> require absurd hacks or custom dlls,
Views require neither hacks nor "custom DLLs" in any RDBMS I am aware of. Can you provide a concrete example?
> not properly source controlled,
There's no reason DB schemas (including, but not limited to, view definitions) can't be source controlled.
> You can't combine views easily
Yes you can, whether by "combine" you mean as joined source relations, or as criteria that need to be combined to slice and dice the same source data. That's, like, playing right to the strength of SQL and RDBMS systems.
> you can't cache results,
Well, not generally incrementally or in something like a recently-used cache; a number of RDBMS's do support materialized views, though.
I don't like to say "you're doing it wrong", but TBH it sounds like you are.
> restricted to SQL structures
So is your ORM; it just adds a layer of abstraction
> require absurd hacks or custom dlls
Literally no idea what you're doing that views require anything out of the ordinary, or indeed "custom DLLs"
> not properly source controlled
Eh? You can store your view DDL files in source control just fine. I mean, they're just SQL files; why couldn't you?
> difficult to perform performance analysis on
I don't see how ORMs offer any advantage here - if anything, SQL has the edge here, since your typical SQL GUI will draw pretty graphs to help visualise EXPLAIN, and some (e.g. SSMS) will even suggest missing indexes
> hard to update
It's just SQL, and for views typically rather basic SQL at that.
I use ORMs on 99% of the projects I work on, sometimes heavyweights like Entity Framework Core or Marten (which I love, BTW), and sometimes more lightweight like Dapper - but I know when to use the right tool for the job, and sometimes that's SQL views. I just don't get any of your arguments against them.
What SQL GUIs do you use that draws graphs for EXPLAIN statements? The ones I've used for OSX (PopSQL, Sequel Pro, TablePlus) hasn't had that feature. Haven't used MySQL Workbench in a few years, so not sure about that one.
I mentioned SQL Server Management Studio (SSMS), which is a great GUI for SQL Server. There is also pgAdmin (both 3.x and 4.x have this feature), and it's also available in Azure Data Studio (for SQL Server; it's coming soon for the Postgres version).
I have spent enough of my life untangling a mess of views that create performance problems to doubt that views are maintainable.
Also, there is usually no good solution for keeping the database in sync with code releases. With the result that database+code change synchronization tends to be a source of deployment complexity.
Keeping the logic in one place is better than the alternative. And of the two, the tools to manage source control are better.
>I have spent enough of my life untangling a mess of views that create performance problems to doubt that views are maintainable.
Just like concepts like inheritance can be a burden when overused, views can be overused as well. Like inheritance, having views go too many levels deep generally winds up causing more harm than good.
Used correctly though, views are able to capture and centralize concepts in a way that are reusable, can be more easily tuneable, and make maintenance easier in the long run. Queries can be simpler and encompass less responsibility as the common logic is implemented by the view, and the logic that is specific to that query is more readily apparent.
IMHO thinking of the DB as just a collection of tables misses out on a key db feature.
>Also, there is usually no good solution for keeping the database in sync with code releases. With the result that database+code change synchronization tends to be a source of deployment complexity.
I disagree that there's no good solution, but I understand that it's not as "out of the box" as many ORMs offer. That being said, writing SQL scripts that coordinate schema changes is not hard, and any CI/CD pipeline that can run commandline utilities can run a sql script against a database.
At my work, we use DbUp (https://dbup.github.io/) and are able to easily write small, isolated SQL scripts that correspond with code changes and get committed to source control, run all of those scripts with our CI/CD pipeline across multiple sharded databases, and do so quite seamlessly. We don't have any questions about migrations or what a tool did, as running the SQL script individually is the same as it runs on deployment. Coordinating db changes on developer machines is a breeze as well.
>And of the two, the tools to manage source control are better.
All of our sql scripts are committed into source control. It's not any easier or harder than looking at a code file, an html file, or a config file to get to the logic your working with.
In the common setup where there is a central database shared by developers, you really can't keep the database in sync with all of the code. Furthermore the manual nature of keeping things in sync means that there will be errors. Hopefully those are caught in staging, but I've seen this not always work.
We use liquibase https://www.liquibase.org/ for tracking, managing and applying any schema changes. The liquibase changelogs are part of our project releases.
XML is the default on the page, but wow the SQL is _so_ much shorter and more readable than the others! To the extent that I wonder what's the point, the obvious is machine readabaility, but one not three, and is SQL really so machine-unreadable that it's worth the extra depth and complexity?
With Liquibase, when you write changesets using SQL, you have to write the rollback code yourself. When you write them with Liquibase's built-in operations in (for example) XML, the rollback is derived automatically.
I write SQL almost daily, but for the things that can easily be expressed with Liquibase's operations, I can write them faster and with fewer errors in XML, so long as I'm using a text editor with XML schema support (hence: autocompletion and validation as-you-type). PyCharm works well for this.
I much prefer having an vs sql project side by side with the ef models.
Keep the schema up to date in there. It can generate migration scripts by comparing the dB you want to target. Then just generate ef models from the dB.
I've come to the same conclusion from using a number of query and relation mapping frameworks (e.g. Rails, Spring). I've finally decided to write one that solves the problems that I have with them. It comes down to just a few things:
1. Embrace SQL
2. Handle the ON clauses for JOINs
3. Fetch relations of sets of records (eager or lazy at the call site)
in batches without making N+1 queries
4. (bonus) async composability
I decided to make it in Java so it works with any language in the ecosystem and having a simple type system will make it easy to port to any others. Don't mind all the various typed expressions, you can just pass in an SQL string with locally resolved bindings then use that element in larger queries then finally execute the resulting tree that maps to the result type. I just started last week and I'd love any feedback like if you think it's a good idea or any specifics. I'm using JDBI underneath which is a good level of abstraction to complement it.
Thanks for the validation. I was beginning to wonder why this hasn't been done already and maybe I'm the exception rather than part of a group who need/want it.
After having written queries to populate whatever platform data structures take the result...to mothership-sized ORMS that keep you as far away from SQL as possible, I have to say a middling ORM like Dapper is optimal for me.
It's lightweight when you want it to be, but if you want to do some heavier/generic use...it can take you quite far.
A lot of ORMs are built in layers so you can use the foundational bits or the higher level bits at your discretion. Which is why I don't understand the binary argument. "It depends", but a lot of ORMs span the entire "It depends" spectrum..
I've come to the same conclusion, over the years. In the java/groovy/scala world I've written my own a couple times but have come to the conclusion JDBi [1] hits the sweet spot.
Awesome you think so too. I just started making SafeQL, not a full ORM but way of composing SQL fragments with type safety and chose JDBI to build it on. [Link on a nearby sibling comment.] Since you've built some already any feedback would be helpful.
+1 to everything but the last bit. In my experience, the best compromise is something like Hibernate (or some other ORM) for the mapping and the really basic queries (find by ID, find by some random field, etc).
Possibly still hibernate for the really advanced crazy stuff, too, though this is more doubtful. I've had pretty good success in the past with Hibernate Geospatial and full text search, for example.
In between those two extremes, just use SQL. It can still be mapped pretty easily with Hibernate and isn't practically harder than SQL would be without Hibernate.
+1 for Hibernate. Everyone starts out hating it. The learning curve is steep and unforgiving. But it can do roughly everything SQL can plus everything you would want from an ORM.
For the few that actually read the manual, there's strongly typed Criteria queries that have the full power of HQL. Which is basically DB agnostic SQL with a few really advanced features removed. There's also lazy loaded collections, caching, change auditing, HBM2DDL validation, mappers that can transparently encrypt/decrypt, calculated columns (calculated by Java code, not SQL), and automatic versioning. All database agnostic.
And with the right settings, it's blazing fast. Maybe only 30% slower than native SQL.
We use all MetaModel strongly typed Criteria queries and it's given us the Holy Grail. Any table or column type or name we change is a compiler error until all the queries are fixed. This has allowed us to rewrite large parts of the schema with confidence that our queries won't break. The ultimate bane of working in SQL without an ORM
LOL, I love the way you describe that... "for the few that actually read the manual". That has basically been my experience as well, and the ones that complain the loudest are the ones that refuse to read and understand what is actually happening.
I've also run into a lot of people that loved it at first and then hated it when it first went into production because they didn't read the manual and didn't pay attention to what kind of queries they were actually generating.
I get the feeling that this is why there is often a lot of angst against it from the operations folks.
Hibernate is a microcosm of Java. If you read the manual things are fine, but it's not designed to be "wingable". It abstracts away enough of the DB that you need to understand the magic or you'll make massive blunders that haunt you later.
Given that maybe 10% of devs read the manual for anything after leaving school there's a lot of Java and Hibernate haters
I'm still at the "hating it" stage, but most of my Hibernate experience has been with older versions of Java (5 & 6), or with trying to deal with other people's HQL sprinkled all over. It's a lot of rope to hang one's self with, IMO. Not that native SQL isn't, but there's something to be said for an enforced separation of concerns that I feel like Hibernate breaks a little too often.
Turn on the MetaModel code generation stuff. Use Criteria with strongly typed queries for everything that's not a simple find. Much less potential mistakes in combination with validation.
Always turn on HBM2DDL validation! Without this you won't know if there's a problem between entity->table mappings until a query explodes.
For the same reason, don't use HQL or Raw DB queries. You lose training wheels Hibernate is meant to give. In the same vein always use MetaModel strongly typed Criteria .get() syntax. Column/table name strings are evil, you lose type safety.
Turn on collection lazy loading. Otherwise N+1 queries will be everywhere. Beware future detatch() advice when doing this, you will need to trigger the load before detaching if you want that collection to be there.
Use a JDBC proxy to monitor for slow transactions and pretty print generated SQL if needed.
Be very careful with Hibernate lifecycle. It normally returns object proxies, the objects it returns to you are not of your actual class. They have magic in them that can explode if you continue to pass the POJO around after the transaction is complete or session flushed. Rule of thumb, detatch any entity you're passing to a different method.
Again with lifecycle... Be aware that asking for the same object twice will not give you a copy. It will give you the same object! Reference and all! This is another reason to detatch. You don't want people modifying POJO's built by Hibernate, they're magically linked to actual DB data.
Use Identity Type = Sequence. It's less dumb than normal auto-increment PK
Never use your entity classes directly in REST endpoints. You're in for a world of pain when refacgoring DB columns when it also changes your endpoints. Use something like MapStruct to map your entities to DTO easily.
Use Lombok to greatly reduce the POJO insanity of entity classes.
Be very very careful with session.clear() . Unless you 110% understand it, don't use it.
Be careful when trying to serialize hibernate entities. If you have a circular reference between two entities (bidirectional parent child for instance), Jackson and GSON will recurse infinitely. To prevent this, use their annotations to prevent serializing in the direction that makes less sense. Mapping to MapStruct DTO's before REST will allow you to directly prevent this.
For things like PK, use primitive types to prevent hibernate from inserting nulls. This is kinda a hack, but nice if 0 is an acceptable default value which it often is.
Enable code generation style dirty checking. Otherwise, for checking whether child collections and such have been altered hibernate will call .equal(), which nobody ever implements correctly (or at all). Better to use code generation dirty checking mode so hibernate can use magic to check fields for you.
Be aware that hibernate doesn't care if FK's or indexes exist. If you want things to run at realistic speeds, you still need to make these yourself.
We managed to make our entire app, with about 110 REST endpoints, entirely strongly typed Criteria queries. Not a single line that isn't DB agnostic or bypasses hibernate. Trust me, things are much better this way. Once you cross the barrier nothing is certain.
There's more, I should probably write a blog post instead of novels on HN
Agree. That's the approach I took with Norm (https://github.com/dieselpoint/norm). The general idea is that I just wanted to get rid of JDBC boilerplate and use plain Java objects, but not stop using SQL. Being lightweight is the key.
Full featured ORMs also provide migrations support - which can be nice to have. You mention Dapper, which I like. But if I were working on a .NET codebase I like the idea of using EF for basic CRUD + migrations, then Dapper for non trivial SQL (everything else).
I completely agree. Having worked on both a database execution engine, as well as on the application side, you could not get me to write queries in any language other than the database's native query language.
btw. even dapper writes that it only solves 95% of the problems:
> Dapper's simplicity means that many feature that ORMs ship with are stripped out. It worries about the 95% scenario, and gives you the tools you need most of the time. It doesn't attempt to solve every problem.
so basically stack overflow used a ORM for years. and i think they still use LINQ to SQL in coexistence with dapper.
Well said. I bitterly hated sqlalchemy because many times I knew perfectly well how to write a query in plain sql but for the life of me I couldn't figure out how to write the same query using the sqlalchemy language.
First, when you get down to it, the most-valued feature of ORMs is not the "writing queries in some language other than SQL" feature, it's the "not having to write a mess of mapping code" feature.
Second, the biggest drawbacks to ORMs all derive from the "writing queries in some language other than SQL" feature.
Fortunately, there are tools out there that solve the "mapping tables to objects" problem without trying to control all access to the database: So-called lightweight ORMs like Dapper and Sql2o. To me, these hit the sweet spot: They give you most the increase in productivity while steering clear of most the increase in complexity (and, by extension, decrease in maintainability).