I've come to a couple conclusions, over the years.
First, when you get down to it, the most-valued feature of ORMs is not the "writing queries in some language other than SQL" feature, it's the "not having to write a mess of mapping code" feature.
Second, the biggest drawbacks to ORMs all derive from the "writing queries in some language other than SQL" feature.
Fortunately, there are tools out there that solve the "mapping tables to objects" problem without trying to control all access to the database: So-called lightweight ORMs like Dapper and Sql2o. To me, these hit the sweet spot: They give you most the increase in productivity while steering clear of most the increase in complexity (and, by extension, decrease in maintainability).
The query DSL for ORMs is fine, works well in a large majority of the situations. When it gets problematic, you can almost always drop down to pure SQL.
The mapping part of the ORM on the other hand is a disaster. Table rows make poor OO classes. The best OO classes are "workers" that have some concrete task at hand. "Active record" style classes have no scope whatsoever - anything connected to the data can be potentially added to the model class.
So on projects with ORMs my main problem is dealing with the responsibility spaghetti. Typically one class which is the central point of the domain grows to crazy proportions. In a document-related software for example, you can potentially put everything in a "Document" entity class. "DocumentConverter" on the other hand has clear responsibility scope - to convert documents - but of course, a converter doesn't have a database table.
So don't map DB records to objects, then. DB records are records, and their proper typing in your business logic is as records—chunks of plain old data, strongly-typed, that your (OO or otherwise) code can declare DB-side interfaces against. The only responsibilities of the module/class that owns the record type, should be getting things converted into and out of that record type.
Plain-old-data DB record types (and their respective owner modules) are to ORMs, as code-gen'ed wire record types (and their respective owner modules) are to ProtoBuf-like wire-format codec libraries. They're the containers you put data into and take it out of, to ensure it's in the right types when being sent to, or received from, your DB. No more than that.
And, corollary to this idea: the POD record-type MyDB.FooTable shouldn't be taken as a type to use to stuff in the results from any random query you run against the DB table `foo`. It should be taken to represent the specific row type that the DB table `foo` has—the one you get when you `SELECT * from foo`. If you do an SQL query, and the result of that SQL query has fewer or extra or differently-typed columns, then the resulting row-set will have a different record type, and so you should have a separate POD data-type that does match these records. (Example: if you join two tables with a query, you don't need the ORM to spit out two records objects with a fancy OO-style relationship between them; you just need one record type that represents the resulting joined rows, without any particular idea that it happens to be an amalgamation of data from two tables.)
Would that be a "RRM" (Record-Relational mapper)? If so we're already on board (in a different language).
P.S. I bet you would never guess which typed language with great support for records makes it easy (for the most part) to build most type-safe SQL queries on the fly, even with projections, without explicitly defining types for every possible projection variation.
Lots of Python codebases are on board with that (and I assume codebases in other languages are as well). It's by convention rather than language-level support, but works perfectly well: don't bolt business logic onto your ORM classes' methods, handle those elsewhere in a separate layer whose conventions you define. Works great.
I get that you're talking about Typescript, but I just want to say that Scala excels in this area as well. I found Slick to be a pleasure to work with once I embraced not trying to map directly from a relational model to an object model, but instead to just use Scala's ability to manipulate and reshape the relational model as needed.
Somewhere in one of his talks, Rich Hickey goes on this great mini-rant about, "Why can't we just let data be data?" I wish I could remember exactly which one it is. Maybe "Simple Made Easy"?
Fortunately, there are tools out there that solve the "mapping tables to objects" problem without trying to control all access to the database
...like, say, Hibernate.
I have a reasonably mature Hibernate app. It is almost exactly as easy to type createNativeQuery() as it is to type createQuery(). Many simple queries are shorter in JPQL, but for anything complex I always use SQL. I never feel like Hibernate is keeping me apart from the database.
I genuinely think most people who bash ORMs just haven't gotten far enough in their SQL learning curve that they get tired of writing the same dumb I/O code over and over. ORMs never "freed you from SQL", they just freed you from boilerplate. Even venerable and much-maligned ORMs like Hibernate.
Around 10 years ago, a work collegue improved the ETL processing time of a Java application from minutes to a couple of seconds, by dropping hibernate and using direct JDBC instead.
If all one is doing is calling createNativeQuery() all the time, and having stored procedures as well, then it is just needless fat.
If you can comfortably use JDBC for data access, then your data access logic isn't very complicated. There's nothing wrong with that; use the right tool for the job.
But Hibernate will add extra complexity by it self. You can make mistakes in de mapping, then it adds multiple caches and you might use Spring and use the open-in-view filter...
I am not stating that Hibernate (or JPA for that matter) is bad but it totally depends on what you are developing. Simple solutions can use Hibernate for more complex stuff you might not.
Using JPA can also make your application independent from a specific database. But if you want to use advanced database features and have chosen a specific database you can not always use JPA.
Actually "writing queries in some language other than SQL" which has static typing and catching issues in compile time is quite big for me. Add automatic database migrations that are also keeping types in line with code and whole bunch of "mess of mapping code" goes away. Though I use .NET EntityFramework which by now is really mature and heavily invested into by MS. Not sure how it is with other environments but I think I have it too easy.
I have tied to use automatic migrations and I do not see the point. Either your diff is trivial and it is trivial to write a migration or your diff is complex and you have to care about the lock levels of schema changes, the time complexity of changes and that the migration transforms data correctly. The code generating the automatic migrations needs knowledge about the structure of the production data.
Lack of adequate support for versioning your schema and in general bad support for anything versioning with RDBMS is one of the main dislikes I have for SQL. I find SQL to be a beautifully expressive language when you get the hang of it, but no RDBMS that I know of has been able to adequately tackle the versioning problem at both the schema and the script level. Everyone's schema evolves over time, why is it so difficult for the RDBMS to try to capture the idea of database evolutions? This is the single biggest selling point for me in ORM land, because with regular (read: non-SQL) code you have all the typical version control mechanisms to help you manage changes to the schema. It's still not ideal, but a lot better than trying to manage this kind of thing in any of the tools that mainstream RDBMS have.
Yes I agree. I ended up making a small DSL to describe tables. Then another program to translate that in to a script that checks everything through meta data in the target db (atm it does mysql). I check the table structure files into version control. If I wanted an old version pull that checkout and run the generator script. I end up with a pure SQL script that creates a blank db with bootstrap data or asserts all the columns/indexes exist for that version of the code.
There is another small dsl for bootstrap data, and I just check in all stored procs like normal code.
What this doesn't give you is good change column type semantics but any other tool I've ever used when this happens, can't be trusted anyway you have manage it yourself. This type of change is pretty infrequent though.
For me it works better to assert the schema I want now. Drop columns arn't done due to similar thorny issues that usually a human has to deal with. Or if its simple I put those in a cleanup proc that runs as the last part of the script.
Using normal code to describe structure I guess is similar its just you're fighting the language with meta data extensions, and also whatever you do isn't language agnostic. You're also at the mercy of whatever magic is going to happen when you change schema with that lib.
This is a big thing for me. Most ORMs I’ve used have decent migration tools which is a huge benefit for managing your schema. We enforce changing Schemas via migrations as opposed to any sort of auto-migrations some packages offer. Makes life a ton easier being able to put these changes into a build pipeline.
Migrations (or evolutions, depending on the language you are using) are a joy when implemented properly. Alembic and Play Evolutions are two frameworks that I have dealt with specifically that are pretty great and hard to go back from.
I'm very comfortable with SQL, but I also like the static typing and compile-time safety that linq brings.
Automatic migrations I like less. In particular, Entity Framework and Entity Framework Core migrations.
Firstly, it seems that the second you stray from the most basic of scenarios, thry generate incorrect DDL.
Secondly, I'm just not a fan of 'magic' code gen.
Thirdly, every entity/model change a dev makes does not need to equate to a schema migration - that's how you end up quickly accumulating hundreds of schema migrations.
When I have to use MS SQL Server, the approach I favour is to use CQRS, using EF Core for the majority of cases, then using Dapper when I need more control. And for migrations I like DbUp, because it provides me with full control over when it's time to create a migration file, and what better DSL for migrations than... SQL.
Damn that's a lot of enterprise level words and acronyms to just "store some data". Lol not saying you wrong... just don't like this part of our industry. Is it too late to go back to Turbo Pascal (record-files) ? :)
Not sure I really mentioned much 'enterprise-level', beyond SQL Server?
I guess I'm making it sound more complicated than it really is then, because this approach is actually a lot simpler than the 'N-tier' abstraction-hell of yesteryear.
Not to mention code reuse. Say for instance we have one query that finds eligible bachelors near me and another that finds newly eligible bachelors. We need two queries but they both rely on the same underlying domain concept of "eligible bachelors". In entity framework it's easy to save that as an expression that can be reused in multiple places. It's hard to do that in sql in a maintainable way.
Not to mention refactoring issues. If I decide to change a datetime field called time to split it into a datetime and offset that change is far safer in c# than it is SQL.
> Say for instance we have one query that finds eligible bachelors near me and another that finds newly eligible bachelors. We need two queries but they both rely on the same underlying domain concept of "eligible bachelors". [...] It's hard to do that in sql in a maintainable way.
Seems a textbook case of using a view for sharing query logic.
> If I decide to change a datetime field called time to split it into a datetime and offset that change is far safer in c# than it is SQL.
If you can do it in C# instead of SQL, you must be talking about a change in the interface used by an app (so, with good DB design, a view, not a base table). But this is very safe in SQL, you are changing a view definition and, if its a read/write view, insert and update triggers.
>Seems a textbook case of using a view for sharing query logic.
Views are just terrible outside of data analyst style work. Unmaintainable, restricted to SQL structures, require absurd hacks or custom dlls, not properly source controlled, difficult to perform performance analysis on, hard to update.
You can't combine views easily, you can't cache results, you don't get static type checking.
I'm not sure how that's a problem, since SQL is a fully-general data language.
> require absurd hacks or custom dlls,
Views require neither hacks nor "custom DLLs" in any RDBMS I am aware of. Can you provide a concrete example?
> not properly source controlled,
There's no reason DB schemas (including, but not limited to, view definitions) can't be source controlled.
> You can't combine views easily
Yes you can, whether by "combine" you mean as joined source relations, or as criteria that need to be combined to slice and dice the same source data. That's, like, playing right to the strength of SQL and RDBMS systems.
> you can't cache results,
Well, not generally incrementally or in something like a recently-used cache; a number of RDBMS's do support materialized views, though.
I don't like to say "you're doing it wrong", but TBH it sounds like you are.
> restricted to SQL structures
So is your ORM; it just adds a layer of abstraction
> require absurd hacks or custom dlls
Literally no idea what you're doing that views require anything out of the ordinary, or indeed "custom DLLs"
> not properly source controlled
Eh? You can store your view DDL files in source control just fine. I mean, they're just SQL files; why couldn't you?
> difficult to perform performance analysis on
I don't see how ORMs offer any advantage here - if anything, SQL has the edge here, since your typical SQL GUI will draw pretty graphs to help visualise EXPLAIN, and some (e.g. SSMS) will even suggest missing indexes
> hard to update
It's just SQL, and for views typically rather basic SQL at that.
I use ORMs on 99% of the projects I work on, sometimes heavyweights like Entity Framework Core or Marten (which I love, BTW), and sometimes more lightweight like Dapper - but I know when to use the right tool for the job, and sometimes that's SQL views. I just don't get any of your arguments against them.
What SQL GUIs do you use that draws graphs for EXPLAIN statements? The ones I've used for OSX (PopSQL, Sequel Pro, TablePlus) hasn't had that feature. Haven't used MySQL Workbench in a few years, so not sure about that one.
I mentioned SQL Server Management Studio (SSMS), which is a great GUI for SQL Server. There is also pgAdmin (both 3.x and 4.x have this feature), and it's also available in Azure Data Studio (for SQL Server; it's coming soon for the Postgres version).
I have spent enough of my life untangling a mess of views that create performance problems to doubt that views are maintainable.
Also, there is usually no good solution for keeping the database in sync with code releases. With the result that database+code change synchronization tends to be a source of deployment complexity.
Keeping the logic in one place is better than the alternative. And of the two, the tools to manage source control are better.
>I have spent enough of my life untangling a mess of views that create performance problems to doubt that views are maintainable.
Just like concepts like inheritance can be a burden when overused, views can be overused as well. Like inheritance, having views go too many levels deep generally winds up causing more harm than good.
Used correctly though, views are able to capture and centralize concepts in a way that are reusable, can be more easily tuneable, and make maintenance easier in the long run. Queries can be simpler and encompass less responsibility as the common logic is implemented by the view, and the logic that is specific to that query is more readily apparent.
IMHO thinking of the DB as just a collection of tables misses out on a key db feature.
>Also, there is usually no good solution for keeping the database in sync with code releases. With the result that database+code change synchronization tends to be a source of deployment complexity.
I disagree that there's no good solution, but I understand that it's not as "out of the box" as many ORMs offer. That being said, writing SQL scripts that coordinate schema changes is not hard, and any CI/CD pipeline that can run commandline utilities can run a sql script against a database.
At my work, we use DbUp (https://dbup.github.io/) and are able to easily write small, isolated SQL scripts that correspond with code changes and get committed to source control, run all of those scripts with our CI/CD pipeline across multiple sharded databases, and do so quite seamlessly. We don't have any questions about migrations or what a tool did, as running the SQL script individually is the same as it runs on deployment. Coordinating db changes on developer machines is a breeze as well.
>And of the two, the tools to manage source control are better.
All of our sql scripts are committed into source control. It's not any easier or harder than looking at a code file, an html file, or a config file to get to the logic your working with.
In the common setup where there is a central database shared by developers, you really can't keep the database in sync with all of the code. Furthermore the manual nature of keeping things in sync means that there will be errors. Hopefully those are caught in staging, but I've seen this not always work.
We use liquibase https://www.liquibase.org/ for tracking, managing and applying any schema changes. The liquibase changelogs are part of our project releases.
XML is the default on the page, but wow the SQL is _so_ much shorter and more readable than the others! To the extent that I wonder what's the point, the obvious is machine readabaility, but one not three, and is SQL really so machine-unreadable that it's worth the extra depth and complexity?
With Liquibase, when you write changesets using SQL, you have to write the rollback code yourself. When you write them with Liquibase's built-in operations in (for example) XML, the rollback is derived automatically.
I write SQL almost daily, but for the things that can easily be expressed with Liquibase's operations, I can write them faster and with fewer errors in XML, so long as I'm using a text editor with XML schema support (hence: autocompletion and validation as-you-type). PyCharm works well for this.
I much prefer having an vs sql project side by side with the ef models.
Keep the schema up to date in there. It can generate migration scripts by comparing the dB you want to target. Then just generate ef models from the dB.
I've come to the same conclusion from using a number of query and relation mapping frameworks (e.g. Rails, Spring). I've finally decided to write one that solves the problems that I have with them. It comes down to just a few things:
1. Embrace SQL
2. Handle the ON clauses for JOINs
3. Fetch relations of sets of records (eager or lazy at the call site)
in batches without making N+1 queries
4. (bonus) async composability
I decided to make it in Java so it works with any language in the ecosystem and having a simple type system will make it easy to port to any others. Don't mind all the various typed expressions, you can just pass in an SQL string with locally resolved bindings then use that element in larger queries then finally execute the resulting tree that maps to the result type. I just started last week and I'd love any feedback like if you think it's a good idea or any specifics. I'm using JDBI underneath which is a good level of abstraction to complement it.
Thanks for the validation. I was beginning to wonder why this hasn't been done already and maybe I'm the exception rather than part of a group who need/want it.
After having written queries to populate whatever platform data structures take the result...to mothership-sized ORMS that keep you as far away from SQL as possible, I have to say a middling ORM like Dapper is optimal for me.
It's lightweight when you want it to be, but if you want to do some heavier/generic use...it can take you quite far.
A lot of ORMs are built in layers so you can use the foundational bits or the higher level bits at your discretion. Which is why I don't understand the binary argument. "It depends", but a lot of ORMs span the entire "It depends" spectrum..
I've come to the same conclusion, over the years. In the java/groovy/scala world I've written my own a couple times but have come to the conclusion JDBi [1] hits the sweet spot.
Awesome you think so too. I just started making SafeQL, not a full ORM but way of composing SQL fragments with type safety and chose JDBI to build it on. [Link on a nearby sibling comment.] Since you've built some already any feedback would be helpful.
+1 to everything but the last bit. In my experience, the best compromise is something like Hibernate (or some other ORM) for the mapping and the really basic queries (find by ID, find by some random field, etc).
Possibly still hibernate for the really advanced crazy stuff, too, though this is more doubtful. I've had pretty good success in the past with Hibernate Geospatial and full text search, for example.
In between those two extremes, just use SQL. It can still be mapped pretty easily with Hibernate and isn't practically harder than SQL would be without Hibernate.
+1 for Hibernate. Everyone starts out hating it. The learning curve is steep and unforgiving. But it can do roughly everything SQL can plus everything you would want from an ORM.
For the few that actually read the manual, there's strongly typed Criteria queries that have the full power of HQL. Which is basically DB agnostic SQL with a few really advanced features removed. There's also lazy loaded collections, caching, change auditing, HBM2DDL validation, mappers that can transparently encrypt/decrypt, calculated columns (calculated by Java code, not SQL), and automatic versioning. All database agnostic.
And with the right settings, it's blazing fast. Maybe only 30% slower than native SQL.
We use all MetaModel strongly typed Criteria queries and it's given us the Holy Grail. Any table or column type or name we change is a compiler error until all the queries are fixed. This has allowed us to rewrite large parts of the schema with confidence that our queries won't break. The ultimate bane of working in SQL without an ORM
LOL, I love the way you describe that... "for the few that actually read the manual". That has basically been my experience as well, and the ones that complain the loudest are the ones that refuse to read and understand what is actually happening.
I've also run into a lot of people that loved it at first and then hated it when it first went into production because they didn't read the manual and didn't pay attention to what kind of queries they were actually generating.
I get the feeling that this is why there is often a lot of angst against it from the operations folks.
Hibernate is a microcosm of Java. If you read the manual things are fine, but it's not designed to be "wingable". It abstracts away enough of the DB that you need to understand the magic or you'll make massive blunders that haunt you later.
Given that maybe 10% of devs read the manual for anything after leaving school there's a lot of Java and Hibernate haters
I'm still at the "hating it" stage, but most of my Hibernate experience has been with older versions of Java (5 & 6), or with trying to deal with other people's HQL sprinkled all over. It's a lot of rope to hang one's self with, IMO. Not that native SQL isn't, but there's something to be said for an enforced separation of concerns that I feel like Hibernate breaks a little too often.
Turn on the MetaModel code generation stuff. Use Criteria with strongly typed queries for everything that's not a simple find. Much less potential mistakes in combination with validation.
Always turn on HBM2DDL validation! Without this you won't know if there's a problem between entity->table mappings until a query explodes.
For the same reason, don't use HQL or Raw DB queries. You lose training wheels Hibernate is meant to give. In the same vein always use MetaModel strongly typed Criteria .get() syntax. Column/table name strings are evil, you lose type safety.
Turn on collection lazy loading. Otherwise N+1 queries will be everywhere. Beware future detatch() advice when doing this, you will need to trigger the load before detaching if you want that collection to be there.
Use a JDBC proxy to monitor for slow transactions and pretty print generated SQL if needed.
Be very careful with Hibernate lifecycle. It normally returns object proxies, the objects it returns to you are not of your actual class. They have magic in them that can explode if you continue to pass the POJO around after the transaction is complete or session flushed. Rule of thumb, detatch any entity you're passing to a different method.
Again with lifecycle... Be aware that asking for the same object twice will not give you a copy. It will give you the same object! Reference and all! This is another reason to detatch. You don't want people modifying POJO's built by Hibernate, they're magically linked to actual DB data.
Use Identity Type = Sequence. It's less dumb than normal auto-increment PK
Never use your entity classes directly in REST endpoints. You're in for a world of pain when refacgoring DB columns when it also changes your endpoints. Use something like MapStruct to map your entities to DTO easily.
Use Lombok to greatly reduce the POJO insanity of entity classes.
Be very very careful with session.clear() . Unless you 110% understand it, don't use it.
Be careful when trying to serialize hibernate entities. If you have a circular reference between two entities (bidirectional parent child for instance), Jackson and GSON will recurse infinitely. To prevent this, use their annotations to prevent serializing in the direction that makes less sense. Mapping to MapStruct DTO's before REST will allow you to directly prevent this.
For things like PK, use primitive types to prevent hibernate from inserting nulls. This is kinda a hack, but nice if 0 is an acceptable default value which it often is.
Enable code generation style dirty checking. Otherwise, for checking whether child collections and such have been altered hibernate will call .equal(), which nobody ever implements correctly (or at all). Better to use code generation dirty checking mode so hibernate can use magic to check fields for you.
Be aware that hibernate doesn't care if FK's or indexes exist. If you want things to run at realistic speeds, you still need to make these yourself.
We managed to make our entire app, with about 110 REST endpoints, entirely strongly typed Criteria queries. Not a single line that isn't DB agnostic or bypasses hibernate. Trust me, things are much better this way. Once you cross the barrier nothing is certain.
There's more, I should probably write a blog post instead of novels on HN
Agree. That's the approach I took with Norm (https://github.com/dieselpoint/norm). The general idea is that I just wanted to get rid of JDBC boilerplate and use plain Java objects, but not stop using SQL. Being lightweight is the key.
Full featured ORMs also provide migrations support - which can be nice to have. You mention Dapper, which I like. But if I were working on a .NET codebase I like the idea of using EF for basic CRUD + migrations, then Dapper for non trivial SQL (everything else).
I completely agree. Having worked on both a database execution engine, as well as on the application side, you could not get me to write queries in any language other than the database's native query language.
btw. even dapper writes that it only solves 95% of the problems:
> Dapper's simplicity means that many feature that ORMs ship with are stripped out. It worries about the 95% scenario, and gives you the tools you need most of the time. It doesn't attempt to solve every problem.
so basically stack overflow used a ORM for years. and i think they still use LINQ to SQL in coexistence with dapper.
Well said. I bitterly hated sqlalchemy because many times I knew perfectly well how to write a query in plain sql but for the life of me I couldn't figure out how to write the same query using the sqlalchemy language.
For everyone complaining about orm-something - take a look at elixir's Ecto. It basically allows writing sql via native code. Good for composability, type casting, stuff like that.
Account
|> where(active: true)
|> join(:left, [a], p in Post, on: p.account_id == a.id)
|> where([a, p], a.foo == "bar" or p.bar == "foo")
|> group_by([a, p], a.id)
|> select(...)
|> limit(10)
...
|> Repo.all()
Plus there are changesets for data validations (can e.g. catch and return uniquness errors via unique index)
P.S. For update/insert queries it doesn't make sense to write UPDATE statement by hand
UPDATE accounts SET ... 50 fields ... WHERE id = 1
P.P.S. changeset example:
def changeset(record, attrs) do
record
|> cast(attrs, [:number, :note])
|> validate_required([:number])
|> validate_format(:number, number_format())
|> unique_constraint(:number, name: "foo_bar_index")
end
will return nice error if number is missing (_before_ running sql query) or already taken (_after_ running a query, on DB error)
Came here to say this. Ecto is solid. Also, the results come back as elixir structs and not bloated objects so it’s much faster than ORMs in many other languages. Additionally, the functional approach makes everything much less magical.
Changesets can take a little while to grok, but they work.
The somewhat tricky part can be converting the tabular result set into a graph of objects, but I think it's better to handle this in your application code as needed, rather than delegate it to a library.
While I learn towards not using an ORM, the productivity gains (at the very least early on in development) are undeniable. What I've always looked for are frameworks that give you an ORM but also make lower level queries very easy, normally via a query builder, allowing you to go back and forth between levels of abstraction. If I had to choose, I prefer libraries that give you the lower level of abstractions first and then build upon those to offer an OOP-based ORM approach (which is what I believe most people think of when they say "ORM").
One of the best libraries I've ever seen get this right was TypeORM[0]. It was easy to get started with, includes consideration for migrations, allows you to use both the query building and annotated-class approaches where appropriate, also allowing for use of the repository pattern if you're comfortable with that, and has pretty great support for lots of different backends (I've used postgres the most though). All of this from a F/OSS project (~2 years ago I was also involved in a C# project during the switch from .NET 4.x to .NET core/standard and was very very annoyed that things I could easily do with TypeORM weren't available/worked out yet in EF core at the time).
I loved TypeORM initially, but came to the conclusion that it was written by people who really got TypeScript, but didn't really get SQL.
For example, we were converting from Sequelize, where we were catching unique constraint violations and responding appropriately, and we got quite confused that in TypeORM those errors never got thrown ... until we discovered that TypeORM decides unilaterally (and almost unbelievably) to replace your original record in that situation.
We have since ripped out all third-party ORMs and migrated to a system of typed helper functions and tagged templates (where all the typings are generated automatically at compile-time by inpecting the database). This is really rather wonderful — we now have full access to all sorts of goodies like native UPSERT, we know exactly what SQL we're going to get every time we touch the database, and yet the ergonomics are excellent, because everything going in and out of the DB is fully typed, including even column and table names in basically-raw SQL. A blog post (and perhaps eventually a library) is in the works ...
I couldn't agree with this more. I've just started using TypeORM and the TypeScript aspect is great. However, here I am trying to do a simple distinct left join, but it's extremely difficult to write because the query APIs consistently get in the way.
We're regularly receiving invalid SQL errors, which from a typed language/API seems particularly odd.
We're doing this exact same query on the exact same DB in ActiveRecord/Ruby and it's trivial to both write and understand.
This is where I land also. ORM's are excellent for the repetitive stuff, especially when integrated with frameworks but I don't want the presence of an ORM to prevent me from accessing the more advanced aspects of my database. I'm a heavy Postgres user and there's just...so much that it's capable of that hiding it behind an ORM to pretend it's "just a database" is a bit like a tragedy.
One of the better balances I ever came across from multiple languages was actually ActiveRecord because of the Scopes functionality. The scopes let you abstract certain parts of queries and name them, reuse them, include parameters, combine multiples of them back together, switch parts out, etc. You can combine raw snippets with fully abstracted parts.
It's extremely flexible and probably the feature of Rails that I miss most when I don't have it.
You can do this with most any ORM by mapping tables on top of views defined in SQL. I regularly use this pattern with Django's ORM to make complex aggregations only a foreign key away. I can write detailed performant SQL that it is impossible to make an ORM output this way.
The DataMapper pattern is very much included in the normal discussion of what an ORM is expected to provide -- so much so that almost most larger ORMs (that I know of at least) is listed on the Wiki[0], including TypeORM.
That said, you're right, the DataMapper pattern isn't quite identical, and it is a piece of fully OOP-based ORMs that I think are on the right side of the spectrum. I'd say that the DataMapper pattern fits in the middle -- I call it Serialization/Deserialization, and assuming the syncing of data is not constant (as in triggering action on another thread or something), then it can be anything from a function (ex. db::insert_user(user), user.write_to_db(db) or Adapter.save(obj)) to a dedicated object (Adapter.update(db, obj)).
If I had to try and point to a difference between the two, I'd say that DataMapper-ish or DataMapper-and-below levels of abstraction expose more internals by default (which is obvious) -- I find that ORMs on the right side of the spectrum I sketched up top try to never expose their internals, which quickly falls apart as soon as you step off the paved/well-traveled path.
[EDIT] I want to note that dealing with this syncing behavior in EF (Entity Framework) was really annoying. The constant management of a pool of objects that "reflected" the objects in the DB and had to be saved all together/flushed before they were manipulated or whatever was really annoying. That said, I am now aware that C# is one of my greatest weaknesses, so take this opinion with a grain of salt, it's entirely possible that I just didn't have the skill to do it right and keep it in my mind the right way. Just to make sure I'm not unfairly bashing EF I went and found a relevant SO post to show the kind of stuff I didn't like having to look up/deal with [1]. I'm picking on EF but this is likely an issue with any ORMs that handle it this way (w/ the object pooling & syncing).
There's another big aspect of ORMs a lot of people tend to skip in discussion: Security.
Raw SQL can be dangerous, and given enough people and code somebody will eventually make a mistake (as is human) and introduce a vector for a SQL injection attack or some other DB specific vulnerability.
A good ORM can be a fairly effective layer of safety.
IMO it is a humongous red flag if someone on your team talks about "raw SQL" and does not mean parametrized queries. I guess I could say I was fairly fortunate in my education/early career but this is a lesson you have to learn early when working with user input and databases.
often they don't mean that - they really do just mean "run any sql you want". just worked on a project that explicitly wrote their own bolted on 'db layer' and avoided the built-in ORM which had parameterized query support. i now know what 1200 sql injection-capable queries look like...
It can be, but it is hardly the only solution to that problem. I've also seen it solved, for example, with a git commit hook that just bounced any non-parameterized queries.
I wish people did not think the choice was solely between "write raw SQL with raw strings" and "try to pretend the database is object-oriented when it is not."
The third approach is to safely wrap the database and its columns with code in a way that is composable.
In Python, SQLAlchemy has an ORM, but it is optional, and you can just work with tables and columns if you want.
The real third approach is that you can safely pretend the database is object-oriented for manipulation and simple lists and still use SQL for complex queries. Most ORMs let you safely mix and match both methods easily.
This ORM or not ORM is the wrong question. Use an ORM to save you headaches where it's appropriate and use direct SQL when it's not.
You are absolutely correct. Using both is a very valid options.
We use ActiveRecord a lot, and then have custom SQL queries using `find_by_sql` for very complex, optimized joins. It works very well. Rails gets out of the way when we need it to.
Two caveats with find_by_sql: it’s read-only, so no insert or update commands, and it still does column-to-instance-variable monkeypatching on the object level, as opposed to the class-level monkeypatching that’s applied to normal ActiveRecord classes as soon as the DB schema is read.
For the former, there's always ActiveRecord::Base.connection.execute. For the latter, I think it's more complicated than that. There also is object-level mapping even for regular AR usage. If you do something like Foo.select("true as bar"), your Foo objects will have a bar variable available to them.
I use this approach with Mongoose in Node.JS. If you want to update a user document, we always query the entire document, set the field, and call .save(). This triggers all kinds of very useful validation hooks and is easy to think about. But if you want to find a series of users, or build an API for a specific front-end form, I've found the ORM just gets in the way, and we have not had any trouble writing db queries inline.
We used to use Mongoose in our project[0], but found that at the volumes of interactions we were dealing with, it was orders of magnitude faster to just use the raw Node.JS driver.
> The third approach is to safely wrap the database and its columns with code in a way that is composable.
Like Haskell's Esqueleto[1] which lets you work with SQL code as Haskell functions and values. As an example from the documentation:
select $ from $ \p -> do
where_ (p ^. PersonAge >=. just (val 18))
return p
results in roughly:
SELECT * FROM Person
WHERE Person.age >= 18
This lets you define new functions to use in your queries that use SQL functions. For example, you can define `isPrefixOf` for SQL using the SQL functions `char_length()` and `left()`. The use of the function will be expanded into the corresponding SQL code when the query runs. Like,
select $ from $ \p -> do
where_ $ val "John" `isPrefixOfE` p ^. PersonName
return p
could result in:
SELECT * FROM Person
WHERE LEFT(Person.name, CHAR_LENGTH('John')) = 'John'
For a long time these were the main choices and the third way had very little library support and generally required you to role your own, which were generally crappy and inefficient. Even when "lightweight ORM's" were first gaining traction their composability was quite poor (and some ways it still is) and would often leave the denormalisation part to the application.
Even though I think this third way should be the "default" way to write an application now there's an awful lot of code left over from when an ORM was better.
There are so many incompatible needs for databases that most DB wrappers of any kind rarely make sense except when prototyping. And database wrapper authors have a tendency to cater to lowest common denominator of features so that they can treat all databases the same, which is going even further in the wrong direction. Choosing a database requires learning what your requirements are, learning what the candidates are, and when you pick one, learning how to use it correctly. People and companies rarely seem to do all of these steps.
The "write-raw-SQL-with-raw-strings" approach has one serious issue -- sql-injections. Some people argue that it is not so hard to filter strings before concatenating them into sql-query, but I know also people who argue that it is not so hard to write C code and to not introduce bugs around NULL and wild-pointers, one just needs to be careful. I, personally, do not believe that strategy "be careful" can work reliably here.
I cannot imagine work with SQL without query builder, which introduces type checks and conversions, and escapes strings when needed. ORM is an optional thing, but it is nice to have some layer that maps rows into structs and vice-versa. With dynamically typed scripts it doesn't matter: in any case you would get a hashtable (the only difference is a syntax used), but with compiled language and static typing I feel some uneasiness when using slow hash table mapping instead of blazingly fast struct field access. And the tooling can help to declare that structs statically.
I currently work on a project that is based on Django ORM. Before this, I almost exclusively hand-wrote all SQL.
I think you can quickly outgrow the limits of ORM... at least Django's. Whether it's needlessly fighting with ORM to get joins correct, ORM deciding it's going to loop through n records instead of joining on DB server, simply doing complex aggregates that ORM won't support, or doing DB-specific stuff. Postgres has some amazingly powerful features that many don't know about because they only learn ORM.
My experience: It can end up being double work because you spend an hour getting the query to work with ORM. Then, a week later, the requirements change and you have to add 1 thing for which it's just not feasible to use ORM. Then you're rewriting it completely with hand written SQL.
ORM seems really popular with web/mobile api now, and hand-in-hand a trend of not learning SQL. I think it's too bad as it creates another layer or separation to knowing how your app truly works. It's a layer that is still important to be familiar with, especially as a project grows.
Using custom SQL with Django has been a mixed bag for me. If you dump JSON from your query, some fields won't be consistently formatted with endpoints that still use the ORM. You can "load" the result into a Django model, but it can be difficult to make it work if you have custom fields you're returning from your custom SQL since Django doesn't know how to format them. (maybe there is a solution to this I haven't found?).
I like the approach of using a basic query-builder for routine insert/update and maybe even select.
I'm a former Oracle DBA and I love the Django ORM. After you learn it, it does almost anything I want, and when I want custom SQL I just write custom SQL.
Learning it well is important though, and inspecting the SQL that was generated for your test cases after you've implemented a feature is also important.
It can do multiple joins with same table when needed, it can do filtered prefetch, it can do subqueries, it can do exists(), it can do group by, it lets you use SQL functions Django doesn't know about out of the box, etc.
But I think the documentation is not intuitive, and in the complex cases the ORM code looks complicated and I think I prefer writing SQL in that case because there is greater chance that the next developer will know SQL than that of them knowing advanced Django ORM features. Also, more chance of getting stack-overflow help for SQL.
I got started with ORMs before I knew any SQL, and I wish I had learned to just write SQL earlier in my coding career. I've spent way too many hours wrestling with ORMs that couldn't quite do some query I wanted to write. I think it would have been better to write some SQL and have people push back w/ a better way to write a query than to try & purge everything that looked like SQL from ORM queries. Having learned SQL has made life so much easier for me!
In my experience ORM always shoots you in the foot when your use case exceeds what an ORM can provide. ActiveRecord has been a terrible mess for us because the facade masks the underlying behavior (transactional behavior) or often doesn't do what it advertises (commit hooks, timestamps, auto-increment).
ORM is great for startups trying to build out an MVP, but beyond that scale it's an anti-pattern.
The best solution I've ever seen is jOOQ, which is essentially typified SQL. Your queries have to pass compilation, but the DSL is essentially SQL itself. It's an added layer of security yet still manages to feel like it's easier to write than raw SQL.
More languages need a jOOQ. It's so good that I'd choose Java just for the ability to leverage jOOQ.
I came here to say that any article discussing the benefits/downsides of ORMs should include JOOQ. It takes such an interesting spot in the design space: not an ORM, but a super flexible, type-safe query builder with lots of helpers so you dont write more code than necessary.
I work at a place with sort of a homebuilt ActiveRecord system made of generated code. JooQ did all the same mapping and object creation just by inspecting my database with a bunch of boilerplate Maven XML, so I wouldn't have to rely on the homebuilt version of SQL join.
I recently put a lot of refactoring work in a java app. The enterprise architects made a list of decisions that amounted to forbidding an ORM, so a lot of JDBC/ResultSet boilerplate was included. The program had a tendency to UPDATE a few of the columns, then SELECT the same data right back a few lines later. Plenty of inconsistencies in the logic. This means refactoring went like this:
First step: Create for each table generic methods to insert, update,select all fields. Now at least memory and DB are consistent.
Next step: All these methods are mostly identical and there is a lot of commons-beanutils in there. So add an annotation to the relevant getters and generate most SQL statements on the fly.
Next step: Replace the incoherent commit/rollback mess with clear boundaries. Either it succeeds or it fails.
Next step: To stop the never ending reloads of data already in memory, SELECT data only when it's not already there. Flush all caches on commit.
All of this gives great results. Batch run time went down from 20+ minutes to a few seconds. Most weird crashes disappear. The need for manual data fixup after crashes evaporates. And then it hits me: I just wrote a custom ORM.
The ORM discussion reminds me of something Rich Hickey said in his talk, “Simple Made Easy”. He distinguished between “simple” (when a system is inherently low in complexity) and “easy” (when a system is made more complex so it is theoretically easier to use).
ORM’s are easy, but not simple. If your system uses a relational database but not an ORM, you have to understand your particular database and also SQL to understand your data layer. If you add an ORM, you aren’t actually saved from having to understand those things, you just also have to understand your ORM on top of all that. There is some positive tradeoff you get in return, since you don’t necessarily have to grapple with the added complexity all of the time.
The main saving grace of ORM (or at least query builders) seems to be that the alternatives aren’t well-supported in tooling, so the “raw SQL” alternatives often end up implemented as string concatenation hell, which is admittedly terrible. Installing parameterized SQL on the DB itself in stored procedures works great, but you have to go out of your way to do it; it never really comes across as a plug-and-play option even though in principle it easily could be.
ORM libraries may not be simple, but what matters is the code that uses them.
My experience was that the main code was often easier and simpler. Easier because small tasks are made easy. Simpler because it makes the code more consistent: for instance, once you know the ORM, you don't have to deal with hundreds of specific cases that insert or update various records.
Promoting stored procedures as an alternative to the complexity of an ORM seems strange to me. They still have to be carefully written like any raw SQL. There is no canonical way to keep them in sync with the code. They are vendor-specific. As far as I know, you need to learn specific tools to debug them or to analyze their performance.
> Simpler because it makes the code more consistent: for instance, once you know the ORM, you don't have to deal with hundreds of specific cases that insert or update various records.
That's not what 'simpler' means in this context. Simpler means conceptually independent things aren't coupled together. For example, ORMs couple together object-oriented code and relational objects. That's the essential complexity that they introduce.
> There is no canonical way to keep them in sync with the code.
There's still the old-fashioned way of checking them into the repo as part of migrations.
> They are vendor-specific.
Most people will almost certainly stay with a single vendor anyway. If you never take advantage of your platform for fear that you'll migrate away from it, you're throwing away a lot of potential benefits for some perceived future risk.
> As far as I know, you need to learn specific tools to debug them or to analyze their performance.
No more than you would with an ORM–in fact, with an ORM you'd need to learn two different stacks to tune performance or troubleshoot.
> Simpler means conceptually independent things aren't coupled together. For example, ORMs couple together object-oriented code and relational objects. That's the essential complexity that they introduce.
I don't actually agree that this is where the complexity comes from. The complexity comes from what makes ORM so appealing in the first place. Your ORM can dynamically generate an infinite variety of queries based on how you use it, which makes it a big piece of machinery with many moving parts, the behavior of which you will have to understand and manage even if you didn't build it yourself. This machinery may break, it may behave in undesired or unpredicted ways, it may consume compute resources inefficiently, and if it does any of those things, you're still on the hook to take care of it because it affects the behavior of your product.
Ultimately, you will always have to map some behavior in your service code to an intended SQL query and then map the result set back to in-memory data. Doing so in a consistent way can potentially reduce complexity, though in a lot of cases, you can probably get away with treating a SQL result set as a list of associative arrays, or even a list of objects as long as you tell it what class to deserialize into (which is what many ORMs do anyway).
> There's still the old-fashioned way of checking them into the repo as part of migrations.
Yes, even many ORM-based services have DB migrations in the service repo already.[1]
You could also version-control the stored procedures, database migrations, etc. as its own software artifact. You can think of a relational database as a service (microservice?) that speaks SQL instead of HTTP or GraphQL or GRPC. It's listening to a port somewhere on your network, it consumes computational resources, it will be deployed independently of your service in such a way that you have to worry about backwards compatibility, it can become unavailable to your service, and so forth. And like most services, you may prefer to have a defined, optimized, versioned interface to your DB instead of just trusting your consuming services to execute arbitrary (SQL) code. This is not necessarily the right or wrong solution for you, but it's reasonable enough.
[1] One possible niche might be an ORM that builds all of your DB manipulation commands into parameterized SQL at compile time and installs those SQL statements as stored procedures via the migration mechanism. Then you really can write raw SQL when you need to, by hardcoding your stored procedure instead of compiling it. This would work best on a DB that you could migrate whenever you felt like it, but I've heard Postgres is one of those.
> You can think of a relational database as a service (microservice?) that speaks SQL
Exactly. As a colleague of mine once said: stored procedures are the original microservices. A bit tongue-in-cheek, but the point is you can treat stored procs as these lightweight services. You don't even have to write the SQL, just call the procs with the right arguments. That's one scenario, anyway.
Sample size one 1 but ORM drives me crazy. Why can't we just use SQL? How does it save time when I have to learn the ORM language, which probably has a lot less support and users?
The OP here says it "reduces boilerplate" -- rarely have I created an application and thought its biggest problem was too much boilerplate.
But everyone at work loves them so I must be wrong somehow.
I think that our love/hate for ORMs stems from the 80/20 rule.
For 80% apps/workloads/programmers, ORMs are pretty helpful (in that they save you computation time, thinking time, etc).
For 20% of cases, the ORM tends to get in the way. In these cases, the ORM doesn't prevent you from needing to know SQL, it doesn't give you performance gains, and it actively obfuscates what's "really going on". These cases make your job harder, and are what stick up in your mind when you think about ORMs.
At my place of work, we use an ORM (SQLAlchemy), but codegen the "annoying/boring" bits of ORM code (model/class specification, table/index creation). IME, it's wonderful and good – until I start wondering about how much memory this query is going to take up, 5 minutes after the client has made it. I know that SQLAlchemy probably caches things, but I have no idea how it's cache policy is set. When will my query be garbage collected? How much memory should I request for my app?
I can probably find this out by digging into the SQLAlchemy docs, but I haven't yet so uncertainty looms. TFA's "I'd much rather spend an extra bit of time typing, but this will save me time reading ORM's documentation, optimizing my queries, and most importantly debugging" rings true to me.
With raw SQL, answering these questions might be more straightforward – but I only ask these questions 20% of the time! Anyways, there's some food for thought. :)
They make a simple crud operation so easy, you don't even have to think about them.
Every remotely competent developer will however tell you that there are always cases where the orm is a bad fit. And that's exactly why basically all orms let you write your own queries.
There is really no reason not to use an orm. Just don't ever frown on people doing anything more advanced than a middle join without it.
You don't need ORM for simple CRUD operations. All you need is a function you can pass a table name and a map of column names/values you want to insert/update.
Until you want to be able to sort and filter and do all of the other things that most CRUD applications do.
Once you're there, if you're doing SQL directly, now you're manipulating strings in your code, instead of being able to build some kind of object that then generates a query for you.
Not at all. The only cumbersome thing about SQL, that is common enough and unpleasant to always spell out by hand are basic INSERT and UPDATE commands.
I use a set of 3-4 functions that I reuse pretty much in all my programs for this. It's no ORM, as it doesn't map classes to tables/data in the database. It's just a shortcut to generate and execute INSERT/UPDATE commands on arbitrary tables with arbitrary columns.
Also it has zero bugs, because you can hardly create bugs in something so simple.
Query builders and ORMs often go hand in hand in my experience, as the ORM will expose the ability to perform custom querying capabilities around the specific object you're working with; however, you're correct in that I am more considering query builders than ORMs directly in this instance.
Reducing boilerplate can be very valuable. Why would you want to spend time writing yet another query when you could be focusing on the parts of the application that deliver value to your customer?
Maintainability is also something to think about. The classes you write for the ORM implicitly document the content model. It can be very hard to decipher the same information from a "show tables" and reading a bunch of SQL SELECT statements, especially if they are all mixed up in the code and broken up with string concatenation and variables. Future maintainers will be able to read the documentation for the ORM which will probably be better than trying to understand a bunch of custom code to deal with complex queries.
For an example of this, notice how in the article the author's "linking table" was nowhere to be found in the ORM section. That got abstracted away! Someone having to analyze the tables directly will be running into those sorts of things all over the place, which is more mental overhead for understanding the system.
I suppose it depends on your flavor of ORM. My experience using Django has been amazing from day one, and support for more exotic operations has improved by orders of magnitude since I last used it.
I think a good ORM is one that strives to make 80-90% of operations ridiculously easy, leaving edge cases to the developer who can always drop down to a raw query as needed. Django does that fairly well, IMHO, and my limited experience with SQLAlchemy felt pretty underwhelming in comparison
EDIT: Though admittedly I did not have access at the RDBMS level to write VIEWs to match my business needs back when I was using Django. If I did, then perhaps I could have chosen that approach – though I would have to forgo all the useful django-admin stuff that comes out of the box, so there's always a tradeoff...
> my limited experience with SQLAlchemy felt pretty underwhelming in comparison
That's interesting, because I usually hear of people having the exact opposite experience. The appeal of SQLAlchemy is that it lets you work at any level arbitrarily—you can use its high-level operations and abstractions for the 80-90% of common cases, but then drop down to its lower level and write complex queries where needed (including CTEs, window functions, stored procedures, etc), without ever resorting to writing SQL by hand. Django's ORM, in comparison, is very limited in the types of queries you can write with it, and is difficult and unintuitive to optimize.
Edit: That said, the Django ORM has improved by leaps and bounds over the past few years. These days, I'd say it gives you a way to cover about 90% of cases in an efficient manner (even if it has a bit of a learning curve). But if you're consistently doing more complex queries, I think SQLAlchemy is vastly superior.
ORM is good when you were going to have to be converting your query results into Python/Ruby/PHP objects defined by your framework. Why write that code yourself if the framework can handle it for you? Particularly if it's a common CRUD app.
I agree with you. I've experience with a few ORM apps, and, frankly, they always suck. In the beginning everything is perfect because an ORM makes writing trivial DB apps trivial. Then your app grows up and needs more features. Now the ORM only gets in the way, and you can't excise it easily. Now you're in a world of hurt.
Frequently the easiest path to perform a complex query with an ORM results in N+1 performance disasters. The inevitable retort is something like "but the ORM has better ways to do that." Of course it does; ORM implementers aren't incompetent. The key word is "easiest," meaning the ORM user doesn't have to spend time learning anything beyond simply chaining method calls. In production such work goes haywire when resolving some related entity causes tens of millions of round trips.
Without an ORM the easiest path is to write a single join, giving the database optimizer a fair shot at correctly optimizing the query, including over time as the database evolves. Again, yes an ORM can encapsulate the equivalent query. Guess what; for every time these ORM capabilities are actually utilized there are probably hundreds of expediently written N+1 queries because the programmer either didn't know better or was indifferent.
To a pragmatic mind this is the inevitable outcome when a tool makes the inefficient solution the easiest thing to do.
I look at this as not playing to core competencies.
SQL's strength is that it's declarative.
By wrapping an ORM around it, suddenly you have a DB engine talking to an ORM engine through a declarative interface. There simply isn't enough information and context passed across the interface to properly optimize.
So in the end, it's like trying to run two optimizing compilers in series, where either knows the specs of the other.
It's the good old "pit of success" story. When doing the right thing is extra work, you will find countless examples of people doing the wrong thing. The only way to always do the right thing is to make that the easiest possible way.
I've only worked with a couple ORMs, but they all allowed me to write my own queries if I wanted to. The ORM generated 90%, but when there was a performance win I just wrote my own.
Yes, this happens with complicated schemas with complicated queries. ORMs account for this. Almost every major ORM allows you to override those "bad" queries when needed with your own manual SQL calls. It's an old problem with an old solution that works in production.
So, I'm the 'database guy' and I come in when the DB is on fire. I find the bad query(ies) and tell the application developer they need to do it another way. (And suggest ways)
Then they tell me, it's too hard / impossible to do something else, and that's after it took them two weeks to figure out how the query is formed.
For these applications, it's best to ensure their DB is isolated from other DB needs and let the tire fire burn as a signal to others, because I can help a lot with performance, but adding indexes can only help so much.
When I've had similar experiences with people running bad queries from strings, it's a lot easier to get traction on fixes. Even when they think my suggestions are crazy (nobody likes it when I suggest a client side join, however, sometimes it's significantly faster)
I agree mostly. I do like C#/LINQ based ORMs because as the name implies - it is integrated into the language and the “ORM” is treated like a first class citizen and it separates the query from the provider that translates it the destination. But even with LINQ, when you get into complicated queries or you have to do a left join, the syntax leaves a lot to be desired.
Also, it can get obtuse when you try to take advantage of any of its complex features.
I will never choose a real ORM for a green field project. The most I will usually want is something simple that takes in a SQL query and projects the result onto an object without all of the overhead - a Micro ORM like Stack Overflow’s Dapper.
And I agree, by the time you create DTOs and add attributes and the entire ceremony around ORMs. It really doesn’t reduce the boilerplate.
I'm currently working on an e-shop. I need to display many tables with different filters.
Administrator wants products starting with string? OK: if filter['category']: qs = qs.filter ...
Administrator wants products without category? Ok: if filter['category']: qs = qs.filter ...
Add multilang supprt? Ok: subclass model, mark fields and translatable, run migrate script
I can combine any filters, ORM automatically creats query with necessary joins, subqueries, group by expressions etc.
If i didn't use ORM i would have to write code that generate select with cca 200 ifs because almost every filter can generate join or add fields to group by or generate subselect etc.
The friction depends on the situation
* x number of teams x number of dBs x number of packages
And the permutations (of x 1,2,many)
When it’s 1 team 1 package you can get away with whatever when it’s many DB many teams many packages you are going to find it very hard to refactor change the scheme when you don’t have a central location for the DB abstraction
In my experience an ORM will cause issues if used thoughtlessly throughout an application. Objects created by an ORM can be used "naively" and cause many round trips to the database, very inefficient queries, etc., but they boilerplate they can save is really helpful also.
I worked on a simple Python CRUD app project a few years ago with some folks who were absolutely against using an ORM for those reasons. There was so much code written that basically just took tuples returned from the DB and instantiated Python objects with them (and vice versa for inserts) that making changes to anything was absolutely maddening. I find that SQLAlchemy Core can be used very simply to essentially take a DB query and return Python objects.
Have you ever tried it? Your coworkers are right. It also helps you avoid common mistakes like SQL injection. When the ORM doesn't work for a specific case, they usually have ways where you can override the ORM and use your own SQL calls. Virtually every major, popular ORM does this across different languages (can personally confirm with Ruby, Java, & Python ORMs - confident that JS ones support it too).
I'm not sure I understand since there are common cases where you have no choice but to deal with user input. Sure you can manually escape user input in your manual sql calls, but if you're doing that manually there's no hard guarantee that you'll always escape that input vs using an ORM.
I disagree, and SQLi is not simply prevented by avoiding user input. There are many cases where you need to use user input for a SQL query and its a valid pattern, and can be done securely with Prepared Statements or Parameterization.
Using an ORM discourages you from writing SQL queries and it also automatically parameterized queries. This is a good thing! In fact, from experience, the single easiest way to mitigate a naive developer from introducing SQLi is requiring them to use an ORM. People should not be constructing SQL statements by hand today, its too easy to mess this up. SQLi shouldnt be a thing in 2019, but, it is.
My point is it is VERY MUCH not orthogonal, its very much related. Avoid naive SQLi, use an ORM. Directly related.
ORM builds on top of mechanisms for prepared statements or parametrization.
Yes, ORM API can perhaps limit developers to such an extent, that they can't construct SQL themselves, and thus can't make the mistakes leading to SQL injection.
SQL query is a string. (that's where you don't pass the user input, not even quoted/escaped, as a policy) User input is passed "out of band", meaning not as a part of the query string. How exactly that happens depends on the RDBMS's particular client/server interface.
That's all that's enough to avoid SQL injections. And it has nothing to do with ORM.
Which ORMs make SQL Injection easy to happen? From my experience input sanitation is one of the main features of every major ORM. I am not aware of any popular ORM that makes it easy for SQL injection to occur.
But I never got to use it at work, and thus lost interest. The biggest thing missing with it was that you lost type safety if you had to dynamically build a query. These days its main problem is lack of compatibility with .NET Core, which somebody else was working on for a while.
I think the flat row model for querying data from related tables is just not very good. I'd MUCH rather work with an object model like LINQ, where I get a list of Foos and each one has foo.Bars nested within it. As opposed to the output of a join where I get one row for each Bar, and parent Foo's columns are duplicated across each row.
Entity Framework does an excellent job of mapping the LINQ model to the SQL one, but since they really are different under the covers, it's easy to produce hefty queries. For example a LINQ "group by" frequently cannot be translated to a SQL "group by".
I tried to keep it simple and not deviate too far from straight SQL. But I still didn't want to force processing those damn row-based outputs onto the programmer, so I at least had to add notation for breaking out the columns of a top-level result set into an object hierarchy:
Great work. I used rezoom in couple of minor projects and like most that I could really trust that query runs if it compiles. Unlike in Entity Framework where you have unlimited possiblities write code which compiles just fine but crash runtime. In principle I don't like an idea that your primary language get compiled to SQL due it's very leaky abstraction. Instead I like expressing queries as data (ideally compile time checked).
In rezoom query composition part wasn't the strongest part. When for example some model needed additional join you had to modify N queries. This kind of composition could be done of course with dynamics but like you say, you lose biggest selling point: compile-time safety.
However these compositions are usually all known at compile time so you should be able to just build them before rezoom checks them. But out of the box compile time programming in F# is not there yet...
Thanks. I agree composition was a pain point. I had various ideas to make things better, like adding support for "erased" views/functions/TVFs that would be inlined at compile time, but it always felt like it'd be hacky and still not solve enough problems.
Type providers are such a cool language feature, but the way developing one works is too damn confusing. Especially when you try to publish one as an easy-to-use package and simple stuff like loading dependencies feels like uncharted territory. When it comes to my precious free time I hate, hate, hate figuring out packaging/deployment type stuff, I just want to focus on my code. So that's a big part of why I haven't done a great job maintaining it.
I feel your pain although I haven't ever implemented type provider to be precise. You have no reason to feel guilty! It's out and it's runs without any major bugs(!). Hopefully community will carry it and at very least it's one nice show case for compile time computing of F# (and maybe add motivation to develop that side of language further).
For my money your type provider was the single coolest one I've seen before. The ahead of time query execution and planning is a killer feature, and for us the only gaps were either process related, meaning we did not have a good out-of-band migration pipeline, or some constructs that were not understood by your type provider.
I really like Rezoom.SQL even though I haven't done a deep dive yet. But that Navigation Properties syntax is enlightening... I just realized that my idealized query language (WIP) needs something similar.
rails g model Post published_at:datetime title content:text
rails g model Comment post:references author published_at:datetime content:text
rails g model Tag name
rails g model PostTag post:references tag:references
class Post < ApplicationRecord
has_many :comments
has_many :post_tags
has_many :tags, through: :post_tags
end
class Tag < ApplicationRecord
has_many :post_tags
has_many :posts, through: :post_tags
end
You're gonna have to pry the Rails from my cold dead fingers if you want me to maintain a web app's database abstraction layer long-term. No other tool works half as well as ActiveRecord. You could maybe convince me to try out https://rom-rb.org/ but only on a brand new project and only if Rails isn't appropriate.
If it's a project that wasn't built with Rails, I'm going to want https://github.com/jeremyevans/sequel if ActiveRecord is too much trouble to introduce.
I have been lucky to be able to pick what I want to use, so I pick non-rails stack (Sinatra/Padrino and now Roda + Sequel). No other tool works as half as Sequel. You're gonna have to pry Sequel from my cold dead fingers.
Adding an index on tags.name is likely not very useful on its own but adding an index with an uniqueness constraint to prevent duplicates is probably a good idea. Assuming there are just a few different tags records (<1000) then a sequential scan is generally either insignificantly slower or even faster than an index scan.
>Not the subsequent years of maintenance.
I'd much rather maintain this example than a code base replicating the same behavior without an ORM.
Fair enough. I’ll concede that wasn’t a fair comparison to the OP. My point remains that examples focusing on the initial setup are uninteresting (to me) because that’s not where the bulk of the work is.
I like the Django ORM. This is likely for two reasons:
1. I only use Django for small use cases where I rarely see any sort of scope creep. There was no real conscious decision about this, just kind of the way it happens.
2. The Django ORM is fairly mature and makes it quite easy to get a small project out the door.
I regularly use SQL directly at work and wouldn't want to try to replace any of it with an ORM even on new projects because we are all just used to working with it and we wouldn't want to add any complexity to our workflow without real, tangible benefits.
I would not, however, be opposed to working on a larger Django project (ORM and all) if the opportunity presented itself.
I've wanted to like ORMs but they always get in the away as requirements start to get more complex. On the other hand, I like writing in plain SQL, but that's often hard to comprehend by other people - you cannot just glance through to know what's happening under hood.
I find Ecto to be the perfect balance of expressiveness, flexibility and clarity. In fact it's the single reason to have picked up Elixir.
Have to agree. been really enjoying Elixir / Ecto / Phoenix.
I find Ecto lets me be as expressive as I need to with Queries, and handles the mapping into models very well for the 90% of easy cases that I need it to.
One thing I really miss from the JVM world was something like jOOQ.
I don't want an ORM (like specifically the object relational mapping stuff) for most of my use cases but I do want an abstraction above text for interacting with SQL.
gorm's sql builder is alright but something better would be really nice.
I don't even want an abstraction for building SQL unless I need to build queries dynamically from some other (presumably simpler) query language.
Put all the SQL in .sql files and make all queries parametrized. Or wrap all queries in VIEWs or functions at the RDBMS. This makes maintenance much easier.
> Or wrap all queries in VIEWs or functions at the RDBMS.
This is like the oldest best practice for RDBMS use, for security, and for decoupling consuming apps from each other and the DBs low level implementation (so that app views and base tables can evolve independently to the extent possible), and for maintainability: all app access to the DB should be through views adapted to the apps needs.
Agreed. Really, just an open/transparent SQL builder part and an uncoupled row scanner that can populate structs reflectively is the perfect middleground to work around the impedance mismatch that inevitably occurs using these magic ORMs.
I was referring to the string composition [0] of psycopg2, a library which also takes care of converting SQL results to the correct Python data type. IMO that meets the criteria for an "abstraction above text" that GP was asking for.
But yes, SQLAlchemy Core is far more full-featured and gives you more than just a few primitives to work with. It also has a lot of library-specific constructs and is much closer to a full-fledged "ORM" than a query builder; what you're writing is not exactly SQL and is yet another dialect to learn.
I don't have any issue with ORMs in principle, and even wrote an ORM once. However, in almost every place I've seen them used they've become a way for developers to avoid understanding how databases work, inevitably leading to inexplicable data models and poor performance. In practice, ORMs tend to end up creating crippling technical debt that is difficult to fix.
If ORMs were typically used by developers that fully understood the implications for the underlying database, there would be few issues with them and they would be a valuable tool. Unfortunately, they seem to be most popular with developers that are the opposite of that description.
in almost every place I've seen them used they've become a way for developers to avoid understanding how databases work
This is what ORMs and NoSQL have in common -- they're created by very smart people for very smart reasons, but the bulk of people I encounter advocating for them on the projects I work on are just intimidated by SQL and don't know or care what kind of burden they're taking on to avoid it. My opinion based on my experiences with ORMs is that to work successfully with an ORM you need to know SQL just as well, and probably better, than you would without it. Plus you need to know the ORM. So the typical adoption story I've seen is that people spend an inordinate amount of time struggling with their ORM and never lose faith that dealing more directly with SQL would be worse. The more they struggle, the more they're happy that they went with an ORM. "Gosh, who knew this would take six months. Imagine if we had to do it without Hibernate!" Uh, yeah, I can imagine writing ten tables and a dozen different joins on them without Hibernate. I can even imagine it taking less than six months and not needing an "ORM architect" who spends his afternoons trying to figure out why Hibernate isn't generating the efficient query he thought it would, and who spends standup telling scary campfire stories to the junior devs about the SQL horrors his Hibernate heroism is protecting them from.
It's true that my ORM experience mostly comes from a fairly corporate environment that I've steered clear of for years, and I've never seen a small team of strong contributors make use of an ORM. I think this difference in experience might explain why ORMs are so polarizing. I've also never found how fast I can bang out boilerplate SQL to be a limiting factor for a task at any scale, so there's probably a set of problems and situations I've never had to deal with where an ORM comes in handy.
Frankly that says more about the people you've worked with than the underlying technology.
I work with devs that have over a decade of Django experience. We use the ORM because it's just ridiculously easier to write queries on it and because SQL is impossible to compose without substantial problems.
90% of the code we write is CRUD and API endpoints. There's no reason to write SQL by hand except for the complex aggregations that comprise 5% of our queries, with luck.
Ah, I see what you mean. I was thinking in terms of union'ing and the like where you can use set theory to compose sets of information from various queries.
Well, I'm not the OP. AFAIK, he may be thinking the same as you.
SQL lacks some power on negative and consolidated joins, forcing one to write more complex queries than necessary. It is this way for good reasons, because those are exactly the kinds of joins that indexes help you least and that most hinder parallelism, so they should be avoided if possible. On my experience, it's not a large drawback, but YMMV.
I believe that People who hate ORMs haven't used a good ORM, so what they really hate is the tool, not the concept. I like using my ORM of choice when it's appropriate, I like just using SQL statements when it's appropriate. Choose the tool appropriate for the task.
I've used many ORMs. I don't hate them, but I think their net utility is negative.
While you initially might think that interfacing with the db is going to be very tedious and labor intensive, it usually turns out to not be that bad.
Battling the ORM to make it do what you want can on the other hand be very tedious.
When you have the SQL in the code it's much more obvious what the performance profile and potential concurrency problems are. That's very important to solve real problems which arise in almost all projects with some scale.
If I were to choose my tools, I would pick some sql-based schema versioning system, and potentially a simple data mapper for moving data back and forth between db records and entities.
But I would be very reluctant to use a full-blown ORM nowadays.
I tend to disagree. To me a good ORM totally (ok, not totally, but as totally as possible) abstracts the database away from your code. That's it's purpose. You just want to work with data. You want to fetch data, persist data and delete data - you don't really want to care about how that's done. That's where a good ORM can really help you out.
However, if the ORM gets in your way, either because of API complexity, performance issues or other reasons, and you start to "battle" it as you said, you immediately move out of that context and start thinking in a more data oriented context. That's where SQL comes in.
As I said before, I think it's a question of using the appropriate tool. I've written hundreds of applications using multiple different ORMs and just plain SQL—and I'll tell you I'll pick a good ORM over raw SQL for basic CRUD operations every time.
However, I'll give you that I'll pick plain SQL over a bad ORM any day.
I've used many ORM's in multiple languages and on multiple teams and have never had a good experience with any of them, once the complexity grew beyond trivial.
I've been tasked with various database optimisation tasks over the years and ORM's have always gotten in my way when it came to understanding a queries indexing and locking and improving/fixing it. I always end up just looking at the logged SQL, fixing it in SQL and then trying to convert it back to ORM code, which has always been tedious. In my experience, in multiple otherwise talented teams, ORM's seem to encourage less-than-ideal database modelling and overfetching data because its too easy to just make an ORM class do it for you rather than really thinking about your data and the data model you need. This isn't ORM's fault of course, but rather the teams using them, but its happened so frequently that I don't think "discipline" or "better developers" is a valid answer. [1] I used to hate databases and dread working with them for the above reasons.
I recently started loving databases by working with Clojure and the HugSQL library. The reason I love this approach is that Clojure isn't OO, so there's no Object mapping: I deal with plain old data instead, in whatever format I make the query return it (I can then easily transform that data further if I need) and HugSQL lets me write (composable and parameterised) raw SQL. Its fantastic and has been super productive for me. I can think in pure data and model the database how it makes sense and still have a good understanding of the indexing and locking of my queries.
[1] As jandrewrogers said: "in almost every place I've seen them used they've become a way for developers to avoid understanding how databases work, inevitably leading to inexplicable data models and poor performance. In practice, ORMs tend to end up creating crippling technical debt" and this has been my experience too. Sure, you can use an ORM well, but in reality, this seems to happen so rarely that in my opinion, ORM's are a net negative to be avoided if possible. Saying that you just gotta use them right (when few people seem to actually do so) seems like a bit of a no true scotsman argument to me.
And FWIW, I've written some moderately complex apps over the years, and routinely use Hibernate / JPA (or GORM when using Grails) and I've never, or very rarely, encountered most of the problems cited by critics of ORMs. And Hibernate makes it easy enough to call raw SQL when you need to (and yes, I have needed that facility a few times).
By and large, for the vast majority of projects that I can imagine working on, my default starting assumption is almost always going to be "use an ORM" -- unless something about the situation strongly dictates otherwise.
An ORM is a technology that has the problem of trying to make easy things easier (CRUD) while making difficult things more difficult (complex joins, SQL lock management, indices, etc.). Why spend so much $$$$$ on a fancy database if you are going to make 70-90% of its features impossible to use?
Fine grained control over locks, transaction levels, JSON indexing, etc. In fact, some databases (older versions of Postgres, Oracle, etc.) do not actually have true serializability so explicit locks may be the only way to implement certain logic correctly. I picked on explicit locks since it is quite useful as well as being generally poorly supported by ORMs.
I'm in total agreement with you. I've tried different ORMs in different languages and I am always disappointed in how lackluster they feel in comparison to the Django ORM.
There are several things that bother me about using Django, but because of the ORM and it's tight integration with the rest of the system I don't think there is a more productive environment for me.
That being said I'm not a great developer. As a business-minded person who codes out of necessity, I find the Django tooling to be the right way to work most of the time.
Not having to deal with migrations is a thing of beauty.
Learning and using an ORM is not an excuse for not learning and knowing SQL + Database concepts. (I'm not saying the author argued that, I just want to sweep away an straw men that say not having to learn SQL is an argument in favor of using an ORM).
For me the biggest benefit of using an ORM is separation of concerns. I truly hate it when SQL and query building get mixed into controllers and other parts of the application. However, I also hate the layered complexity that the author talked about, which definitely means ORM's are not appropriate for all use cases! But if you don't use an ORM you need to be diligent with using some other pattern and code organization to keep your code clean and your concerns separated.
I worked at a consulting company for a number of years and we got immense value out of Symfony, which includes the Doctrine ORM. Many of our projects boiled down to custom Content Management Systems, so we got the huge benefit of being able to define the data model in code and not worry about any of the SQL while also not having very complex queries. So we got pretty much maximum benefit and minimal exposure to the down-sides.
I will say that these systems almost always wanted some detailed reporting, and that I usually got tasked with writing them. They always had some complicated queries and I almost always made use of the "raw sql" option to just write my own queries and spit out CSV. Which was always a much different use case than the rest of the application.
I was pretty excited when I originally learned about ORMs. But I quickly found that their usefulness is very limited in the enterprise, where most of my career has been spent.
You will likely find, as I did, that enterprises literally have thousands of databases of all kinds and many crazy data models.
They have tomes of huge SQL procedures that are still to this day running their business processes. Maybe even a nice older gent that is the only one that really understands how any of that shit works?
If this sounds familiar to you, then don't waste too much time on ORMs. In my experience, they're only good for mapping out your POJO or POCO objects and performing simple CRUD operations.
Use them to stand up simple REST routes that do CRUD on your entities. That can be the API that you use to do development and testing, and possibly even use those CRUD routes to do larger business operations.
But in reality, the business "requirement" is going to come down to you, the developer, in the form of some old terrible blob of shit SQL. Two thousand lines. Ten thousand lines.
You're probably not going to bother deciphering all that and re-writing it. The ORM is going to barf all over itself if you even try to use its API to execute that query. Even if you could express all those joins and case statements via the ORM, it is going to pick an execution plan where the query finishes sometime after the next Hanukkah.
At this point, don't bother with the ORM. Drop down to some driver where you can just run raw SQL and run the raw SQL.
Create a REST route that just calls this whack query directly.
This all may sound cynical and it is. But I know what I'm talking about. Do yourself a favor and remember what I said.
I have written several queries that span hundreds of lines, primarily for "advanced" search functionality that lets you search by a huge array of potential parameters. Without using the query builder provided by the ORM the code would become intractable to mortals quickly. Having several dozen different parameters that are only set under certain special conditions is much easier if you can use your regular programming language to evaluate those conditions as opposed to writing everything in one big SQL statement even with named parameters.
Writing code like this here really gets old:
q = query(SELECT a FROM table WHERE (:PARAMETER_1_DISABLED OR table.field < :PARAMETER_1), ... 20 other parameters)
q.setParameter(PARAMETER_1_DISABLED, condition)
q.setParameter(PARAMETER_1, parameter)
.. 20 other parameters
compared to just
Type.list() {
if(condition) {
field < parameter
}
.. 20 other parameters
}
Also please don't tell me to avoid this problem by concatenating the SQL conditions to dynamically create the query... The end result is still ugly and might even introduce security problems.
I want to write sql but have my library map between my languages built-in types and my sql types. I feel like something like Dapper is the correct abstraction for communicating with a database.
I really like Sequelize after using it in a couple of small projects. Not quite that level of concise, but I find it to be a really nice balance of concision and flexibility.
Person.findOne({where: {name: "Grandma L."}})
Person.findAll({where: {age: 30}})
Particularly, I really like how associations are handled. If I have a one-to-many association between Posts and Comments, I can do something like this:
Post.findOne({where: {title: "My Article}})
.then( article => {
article.getComments()
.then( comments => {
// do something with all comments of Post "My Article"
});
});
Ha! Glad to see Peewee mentioned. In two bullet points: predictable apis, compostable pieces...or, learn once, apply anywhere. Or jOOQ but python. Take your pick!
If code is your thing, Peewee has quite a few examples:
Personally I think an active record style ORM for Go like gorm is a poor fit for a language that doesn't come across as inherently OOP. Going through some of the documentation for gorm, it seems to rely heavily on method chaining which for Go seems wrong considering how errors are handled in that language. In my opinion, an ORM should be as idiomatic to the language as possible.
I've used sqlx[1] before, and it feels pretty idiomatic to Go. You tag your structs with their respective database columns, write up a query, and hand it to sqlx to perform the deserialisation of the data. I've also come across squirrel[2] too, though I haven't used it, it does look rather interesting.
For my 2¢, We use squirrel at work (I haven't touched it myself though). People here seem to like it.
It's not an ORM per se, but it seems to occupy the sweet spot you're describing. It takes away some of the more tedious parts of using SQL, but allows you to still reason about what's happening under the hood without committing tons of documentation to memory.
Just say no to ORM.
Apart from the usual problems with impedance mismatch, there's one thing that is rarely talked about: Waste.
Most ORMs work like that: You get a request, you start a transaction, you begin instantiating lots of objects because you need those to do anything useful.
You then either serialize parts of that object tree into JSON or whatever, or you change a few objects, which then create UPSERT statements, the transaction commits (or not) and then...you are throwing it all away again!
You might save some state in some second level cache, but other than that, you just created a partially populated object graph, probably loaded way too much stuff, and then you do that AGAIN for the next transaction.
I've used JDO, Hibernate etc. before and nowadays it feels like extremely wasteful with limited productivity gains that quickly disappear if your objects become more complex.
I tend to agree, but in static languages, reflection engines are usually needed to automate or simplify mapping, and reflection can get messy. Static languages are not designed for CRUD, in my opinion (without a revolution in frameworks, at least). Look at the square-bracket "annotations" or "attributes" in C#. Isn't that kind of info that classes or object instances were supposed to define, such as fieldx.maxLength=30;? Square brackets are a kludge.
Both... I prefer a hybrid approach when it comes to ORM's. In most projects I will use an ORM to avoid boilerplate for simple entity queries (save, findByid, findByX, delete) and converting a database row to a class instance but I avoid mapping complex relations.
An ORM like Spring Data with JPA/Hibernate in the Java/Kotlin world works well for this. I write my db schema by hand, create an entity class with the same fields and an empty repository interface extending CrudRepository. This gives me simple CRUD access to the table with almost no code.
When I need complex queries I inject a JdbcRepository which allows me to query the DB using standard SQL and a RowMapper lambda.
Yep, the Spring Data CrudRepository interface based stuff is an amazing time saver. Just extend an interface, and BOOM, you've got the basic CRUD operations ready to go. And then you just use HQL in annotations for more specialized queries. Other people's mileage may vary, but I've found this to be a tremendous boon.
Apart from the well thought out and "proper arguments" like "You don't know what queries it does underneath...". Its usually the "insane-funny-sub-language" you have to learn to work the said ORM which only works for 50% of the queries before you really have to get weird with it.
My not correct and sub optimal solution over the years is to have some sort of "god-class" that has a big collection of sql-related queries with some query-reuse(error checking etc) like "student_insert_update(namedList,namedValues)"
or "student_get_all_subjects(student_id,semester)"
I think the most common problem with ORMs is that people doesn’t know when to use the ORMs functionality or when to write their own query. This always leads to N+1 problems and unmaintainable code, because when you eventually finds those problems it’s usually quick fixed because there’s no time for fixing all of it. It leads to ad hoc code for the next guy trying to change. This is generally not the ORMs fault, but always the programmer and that they don’t understand those limitations. Just replacing the same code with “raw” sql would produce the same thing but you need to write it all instead.
Another advantage that i've come across and that has not been mentioned so far is, that an ORM can help you a LOT, when implementing role based access control (RBAC).
Mapping the objects and having all the metadata had huge advantages over pure SQL, if convention over configuration is used to control access to the given objects with a User/Role Entity - and object inheritance gives you the ability to add a "created_by", "modified_by" to each table for controlling even access to the row level...
Or HugSQL, which is similar, but actively maintained. Coupled with Clojure's data-centric focus, there's no need for a traditional ORM and you get to work directly in SQL. Yesql and Hugsql made me enjoy using databases for the first time ever.
I like to mix orm/non-orm.. for anything involving multiple joins and more complex stuff I'll just write raw sql...
Using laravel as an example I'll usually make a $sql var to hold, then just run DB::select(DB::raw($sql))); It still keeps things nice and succinct, I let DB handle some of the setup/connection boilerplate, and I just figure out the sql myself, but for 80% of the time I rely on Eloquent..
The benefit of relying on Eloquent is I tend to like to 'tie' into the event system, so for example you have to explicitly be aware when adding/deleting a user that might have other related tables created at the same time as 'setup', you can tap into the 'creating/created/saving/saved/etc' events and when a new user is 'created' you could create a profile for example.
For me I think the killer feature is keeping an event cycle when it matters, sure you can explicitly remember to update/map everything when you code your sql, but is everyone on the team going to do that? If you for example update profile>fullname whenever user->firstname or user->lastname is updated, it's easy using an orm. not so much when 15 different devs are updating things in different places in different ways using sql only. Everytime ->save or ->create( is called it handles the events you put in play. (This is just an example, I'd probably just do a setter or getter to grab the first/lastname and combine them instead of having duplicate data, but again this would only work if you're using eloquent to grab the data in the first place, because it's model related)
What a great article, this paragraph really reasonated with me:
> Any situation where complex functionality is wrapped in another layer runs the risk of increasing the overall complexity when the wrapping layer is itself complicated. This often comes along with leaky abstractions - wherin the wrapping layer can't do a perfect job wrapping the underlying functionality, and forces programmers to fight with both layers simultaneously.
I think we can all agree that a mapping layer from a relational DB to your language is needed in almost every application, so the question is whether to build it yourself or use an off-the-shelf solution.
I work at prisma.io and we've set out to make database workflows as easy as possible for developers, this includes database access (= ORM) and schema migrations.
With polyglot persistence becoming a standard pattern in modern applications, today's ORMs fall short since they only map from relational databases. Our goal at Prisma is to provide a type-safe mapping layer to all your databases at once. This will allow developers to focus on their application logic and not worry about how to access the data from the various data sources.
The render() method generated SQL; the Select class allowed a _fairly typed_ input; with some evolution, you can get shared queries quite nicely.
This works _reasonably well_. Part of _why_ I went this route is that I tend to use Postgres in a very serious way: it's not just a fancy set of spreadsheets with FK linking: I have indexes, check constraints, enums, pg-specific types, triggers, etc. So being able to directly interact with the database gives you a rich control surface that ORMs tend to exclude you from.
By the way, this was an evolution in concept from a different database oriented system I wrote in Python. Both systems used Postgres rich types, the class/render system was implemented in Scala and was Very Nice.
I agree with that sentiment. Concatenating strings is miserable and error-prone, especially for highly dynamic queries.
But there's still another step between a full-blown ORM and a query builder, where you still define the models and the relations between them, but fall back to a plain query builder for constructing queries. That way, you're not dealing with plain arrays, etc., but you're still essentially writing SQL. Like Objection.js[0], which is built on Knex.
Shameless plug: if you're using TypeScript, checkout https://www.npmjs.com/package/@wwwouter/typed-knex
I love Knex, but was missing the type safety, so I created a TypeScript wrapper around it.
If you look at the code, you can see I started out as a C# developer :)
It's being used in production for a few projects, but it would be great if more people started using it.
Feedback is more than welcome!
What is not obvious if you have never seen this before, is that this dynamically connects your database WHILE you are coding and automatically gives you auto completion, fully understands the types, and will give compile errors if your code doesn't match the database. Feels like Magic (or LinqPad if you have ever used that )
The only downside is you need a DB available for your CI build tools ( which is not too difficult to do ). Some F# people will also opt for a light weight mapper like Dapper.
The number one reason to use SQL is not a reason not to use an ORM. SQL is extremely powerful - it can make your data sit up, roll over, play dead and even speak. Learning how to harness that power is worth while. So maybe use an ORM now, while you are learning SQL.
Or use a "pure orm" (mapping to plain business objects) without a query builder - thus writing native sql and receiving nicely structured objects from the results (eg https://github.com/craigmichaelmartin/pure-orm)
I fell out with a lot of the popular ORMs and decided to create a very lightweight query library with the features I really wanted, namely an ENTIRELY statically typed query interface.
The idea is that it should be impossible to write an invalid query, and also to ensure the returning data type is exactly representative of what has been queried.
Currently using knex.js to actually generate the SQL behind the scenes. It's experimental but I'm already using it in a couple of projects. I enjoy the fact the my queries have so little boilerplate now. :-)
For a toy project I used Postgres to generate JSON objects for me and a helper function in ruby (db_helper) that parses the JSON into ruby data structures so that the OP example can be written with much less boilerplate code as:
def db_all_posts_in_tag(tag_id)
db_helper("
SELECT json_agg(r)
FROM (
SELECT post.post_id, post.published, post.title, post.content
FROM post
INNER JOIN post_tag ON post.post_id = post_tag.post_id
WHERE post_tag.tag_id = $1
) AS r;
", [tag_id])
end
The ORM discussion is really just a proxy for "Have you had lots of experience managing data in production" (in which case against ORM), and also "Have you had lots of experience managing data with a team of green engineers" (who most all want to use ORM because it's cool, but also, hell no).
Now, about the only way for a developer to understand how awful they are is to really spend some time going all out with them. So my advice would be to embrace ORMs whole heartedly, as soon in your career as possible. As you debug it, you'll learn lots of wonderful things.
The desire for the object mapping (to pure objects, not db-aware objects), while having an aversion to query building APIs led me to use https://github.com/craigmichaelmartin/pure-orm - where native, unobstructed SQL is written in a "data access layer", which returns pure "business objects" to be used in the app's business layer.
I quite like the middle ground of micro orm. Gets rid of the trivial boilerplate like populating fields from query results, and still offers any custom queries.
I am quite disappointed that both the article and the comments dont mention application security. USE AN ORM. Unless you are a hotshot SQL dev who knows the intricacies of every RDBMS, you should be delegating this to the ORM. Its there to prevent you from blowing your foot off by introducing SQLi vectors. Use an ORM, avoid SQLi.
Using an ORM to enforce security is like wearing your motorbike helmet down to the store in case a bird shits on your head.
Yes, it will help prevent bad developers from introducing sql injection vectors, but with a whole lot of extra baggage coming along for the ride.
And there are other application security scenarios involving database where the ORM gives you nothing, such as always forcing a "tenant = " filter in a SaaS scenario (which some database engines do support).
Something like myBatis, jOOQ or Dapper are more than good enough for the boilerplate of building queries and data mapping.
No need to have needless generated SQL queries, with data going around the pipe, when it should be processed at the server side, taking advantage of the RDMS capabilities.
Liking to write SQL can easily make you think ORM is tend to have more mistakes or become out-of-control, but in fact, you can fix certain bottlenecks later and combine with raw queries without having to get non-ORM overall.
You should decide on the type of data storage for a given domain and then decide on the patterns required to support that domain. The danger of using an ORM is that you can easily hide critical business functionality.
Sorry for being off topic but is there an ORM or any other tool that can rewrite messy SQL, ideally replacing right joins with left ones at the same time?
I have several hundred of these that I am afraid to touch.
Is there a solution in Python/Postgres where I don't have to have ORM'd tables, but that much of the boilerplate SQL CRUD-type things can be reduced? Basically just beyond writing raw sql.
Go with an ORM when your schema is still in flux but transition to raw SQL when you can the tech debt and performance overhead is just insane otherwise.
Perhaps there are performant ORMs but in my experience ones like SQLAlchemy are terribly slow in that they make it very easy to have not so optimized queries.
I’d wager though that hand crafted queries tuned with the right indexes and the like are going to be much quicker than any abstraction like an ORM. If you’re using it as just a query builder then that’s something else.
In the project I'm currently working on, objects are fetched from the database using ORM queries and then the results are iterated over, filtered further and additional queries are made to fetch more data. This is then all sent to the frontend, where only about two or three of the couple dozen fields are actually used.
I found this code because I was wanted to fix an issue where if you sort the list in the UI, it only sorted the current page (because the sorting happened in the frontend only). So I wanted to change the query so that sorting and paging are both done in the database. What a rabbit-hole that was! I was able to refactor the code to not over-fetch quite as badly, and to move the filtering into the initial query (so only additional data fetching happens after the initial query now, but it does still over-fetch fields and that's not easy to change since this function is called by many endpoints and the set of fields they all want varies). I had to log all of the queries so that I could construct a new query and then convert that back to ORM code. It was a very tedious task, but I managed to get the query from taking 9 seconds to under a second.
In a previous project, I was tasked with optimising the database, because requests were incredibly slow. The issue was that no care had been taken 1) with indexes -- indexed fields were too large, and 2) with locking. The locking was the big one as there were some requests that caused entire tables or large amounts of rows to become locked, which in turn would prevent other transactions from completing, so otherwise fast queries were now also slow, which in turn... you get the idea. This was in a large well established codebase, so it was not easy to fix and it still wasn't fully fixed when I left that team.
Many of these issues could be avoided by considering the data and access patterns when designing your database model (which doesn't have to be identical to your application model -- database should IMHO be modeled around query patterns and application model around domain features), with some care taken for indexing and locking. You want to make sure that different users usage patterns overlap as little as possible to avoid locking, that your indexes are small, that your queries fetch only what you need (both in terms of columns and rows).
None of these issues are the ORM's fault per se, but the ORM encourages developers to think in terms of their application model and primary programming languages OO facilities, so people (in my personal experience, at least) tend to mix database queries and application logic too much (like what I described above: pull data from DB, do some filtering that maybe should have been in the query, do more queries that maybe or maybe not should have been joins), tend to overfetch (often the ORM pulls in too many columns unless you make it not do so, which often is not done), using indexes as an afterthought and not considering locking at all. I think because ORM make it look like "just some more application code", these things are often overlooked, while with SQL queries, its a little more obvious that it's executed differently. I would have said its down to junior developers or whatever who just haven't learned to take care when writing queries, but I've only ever worked with two people who actually did this and I can't believe that the rest of all of the teams were just inexperienced.
I'm no database expert, but I've managed big wins with small amounts of care and by understanding what queries I'm running.
PS: The book SQL Performance Explained is quite good, in my opinion, and if you use Postgres, then Mastering PostgreSQL 11 is great (and it starts off talking about transactions and locking).
This fits my experience as well. Tracing the queries SQLAlchemy builds based on the models of our Python application the DB is returning all of the columns for every model instead of just the columns that will then be returned via the API to the client. They’re not even just the queries to compute the relation but all the columns from every relation involved and then at return time only a small subset of columns is used. Its terrible for query perf as we almost never do just index scans because we also have to pull data from the other columns. It’s a waste. I’m not sure how to fix that just yet without breaking the ease of use. The idea of being able to say model.select_one_or_none()... or something similar I can easily tell is quick and painless and easy to train about but doesn’t do the Adan justice. And then I hear the DB is slow! It’s a battle I’m losing but I hope to help improve where I can.
This sounds exactly like my current project. The ORM is pulling all fields for the model objects and related objects instead of the two or three it actually needs and there’s no easy fix because too many unrelated features make use of the same query code, but use different fields. I’ll slowly change it to using multiple single purpose queries probably, but it’s going to be a tedious task.
As an experiment, I changed one to only fetch the fields that it needs and the query ran in approx. 15% of the time. That’s a pretty big improvement! Sigh.
Everytime I show up at a company with flask-sqlalchemy, step #1 is to remove it. Raw sqlalchemy is better, but we always end up with SQL. It's probably my personal bias of writing 4 SQL parsers, but I find it better than the alternative. Although it's very nice to have something which can safely map the types
Most of the stuff I said still sticks for me (after reading article, and after using ORM this time).
I haven't programmed much golang. But with Python + Django I can use `./manage.py shell_plus` and can drop immediately into a shell with history, readline support, syntax highlighting, and tab completion, with all model objects in local scope.
I can then do lookups and annotations (Yes, they're expensive, not for production, but save lots of time during the day in a pinch):
When the data model of a project gets bigger, writing queries by hand gets harder to think about. Perhaps its due to my own overreliance on ORM's: but I can't fathom how I'd manage without them. Because in practice the data models are far more complex than above, and it'd take a lot of time (not to mention mental energy that could be put to use elsewhere)
Also, there are ways around performance problems: For one, it may entail rejecting features offered by the ORM. Such as django's content types, multi-table inheritance to instead use abstract inheritance (basically just reusing common fields) and even plain one-to-one relations. They're just too complicated and slow at scale.
Same goes for extending the ORM with stuff like django-polymorphic (and django-model-utils). I've gotten nice performance out of django-polymorphic (and cleaner code), but it's hiding a ton of metaprogramming, there's still a penalty "upcasting" naive models, and they're very burdensome to maintain if APIs fall out of date.
Next is trimming down queries with prefetching, only(), and doing direct ID lookups. This prevents multiple queries from piling up by doing the join ahead of time, only getting fields asked for, and doing the fastest lookup for objects after a more expensive "filtering" query has ran.
Here's an example of the query earlier, with .only():
And finally, for debugging certain types of query performance, django-debug-toolbar is nice. If its API calls django-silk can "look back" on background requests.
The article was fair IMO, but I'd wager ORM's payoff depends on the ecosystem. Software being about tradeoffs: the convenience outweighs the downsides everytime for me. Maybe I'm to be humbled and find I'm not getting the big picture, and it could be just plain learnt-dependence, but it'd be a step back in productivity for me not to have an ORM.
(tl;dr: there is an impedance mismatch / leaky abstraction between relational data and OOP for all but the most trivial use cases, guaranteeing that whatever problems an ORM might solve will be traded for new ORM problems instead)
I wonder how people of Vietnamese heritage would feel that you are characterizing their country as synonymous with war and disaster? IIUC Vietnam had been a relatively peaceful country for almost 50 years.
Not coincidentally ORMs have been doing great since that silly article was written so many years ago.
Of course I will never use ORM again. Sqlalchemy made my life so miserable that I felt I am the only stupid person in the entire planet. What ORM guys don't think is that SQL IS NOT OBJECT ORIENTED. Why the hell you want to pretend something completely fake?
The problem does not end here, Using ORM make you think object orient which the worst corrosive idea since the dawn of computer science. It made you code like spaghetti.
You read your code 90% of the time and you write code only 10% of the time, that's why clean functional code will result to better software.
First, when you get down to it, the most-valued feature of ORMs is not the "writing queries in some language other than SQL" feature, it's the "not having to write a mess of mapping code" feature.
Second, the biggest drawbacks to ORMs all derive from the "writing queries in some language other than SQL" feature.
Fortunately, there are tools out there that solve the "mapping tables to objects" problem without trying to control all access to the database: So-called lightweight ORMs like Dapper and Sql2o. To me, these hit the sweet spot: They give you most the increase in productivity while steering clear of most the increase in complexity (and, by extension, decrease in maintainability).