What ORMs have taught me: just learn SQL (2014)

JesseAldridge · on Sept 21, 2019

This was my position for a while. ORMs introduce a layer of magic which obscures what's actually going on under the hood. I decided I would just make raw SQL queries and handle mapping data explicitly.

I quickly ended up with a lot of duplicated code. So then I thought, "Well ok, I should add a bit of abstraction on top of this..." I started coding some simple functions to help map the tabular data to objects. One thing led to another and suddenly I looked at what I had done and said, "Wait a minute..."

kccqzy · on Sept 21, 2019

There is a big difference between just writing helper functions to construct SQL and convert data types, and OO-style magical auto-persisted objects. The latter is what I don't like about ORMs but the former is fine. I feel that this is an important distinction to make.

As an example, the sqlalchemy docs[0] make this very clear: there's an ORM, but there's also just a core expression library that simply helps you connect to the database and construct queries.

[0]: https://docs.sqlalchemy.org/en/13/

gmac · on Sept 21, 2019

Agreed. Helpers (and indeed types) can make working with SQL an actual pleasure. You do need to learn the SQL, though. (My TypeScript/Postgres solution, in this vein: https://github.com/jawj/mostly-ormless/blob/master/README.md).

GordonS · on Sept 21, 2019

I've never been a fan of codegen, but I think I could get past it for this library - it looks great!

I love how it let's you use SQL, while taking full advantage of TypeScript's wonderful typing system to give you intellisense and compile-time checking. Reminds me a bit of the SQL type provider for F# (which I was amazed by when I first saw it in action).

I really like the way the readme has been written too - it gives a real insight into the thought processes that led to the final result.

gmac · on Sept 21, 2019

I've never been a fan of codegen, but I think I could get past it for this library - it looks great!

Good news. :)

Maybe we can agree that typegen smells less than codegen?

Reminds me a bit of the SQL type provider for F# (which I was amazed by when I first saw it in action).

I should look that up — sounds interesting.

Attained · on Sept 21, 2019

I worked for a bit on a code gen based typescript postgres builder, but haven't had time lately to build it out - https://github.com/Sammons/morbid

I really think typescript would benefit from a good solution to this.

ThinkBeat · on Sept 21, 2019

I do like the code-gen solution a lot. You can create code that is far less bloated than a generic framework.

I am old fashioned. I like to start with the database schema and generate code from that. I make a change in the schema I regen the code. Thanks for partial classes in C# I can persist customizations between code-gens if necessary.

timmy-turner · on Sept 21, 2019

Wow this is great! Very well written README.

What just blew me away is the thing with the `JOIN` and the `to_jsonb(authors)`, all with complete typing support for the nested author object. I was actually looking to use a classical, attribute driven query generator (with the sort of chaining API everyone is used to: `tableName.select(...coumns)` etc.) for my next project involving to maybe replace/wrap/rewrite a Rails app and its ORM with Typescript and Node. Maybe I'm trying this instead I'm already half sold. Just worried about forcing colleagues having to learn SQL instead of using a fancy wrapper.

kilburn · on Sept 21, 2019

I'm also awed by this!

> Just worried about forcing colleagues having to learn SQL instead of using a fancy wrapper.

My current team is pretty junior, and I don't see any problem with this. Simple SQL queries are really easy to learn, and complex queries are harder to understand with ORMs than in raw SQL.

Moreover, knowing SQL is a useful, marketable skill that will stay relevant for many years to come. If there's some resistance, I can easily convince the team that going this route will benefit them personally.

Back to the README, there are two questions I'd like to see addressed:

1. Whether `Selectable[]` can be used to query for a subset of fields and how.

2. In the `to_jsonb(authors)` example, what would you get back in the `author` field if there were multiple authors with the same `author.id` value? An array of `author.Selectable` objects? This part is awesome but brittle, isn't it?

I would love to see this move forward! I will definitely play with it and consider it for my next project.

gmac · on Sept 21, 2019

I'm also awed by this!

:)

1. Whether `Selectable[]` can be used to query for a subset of fields and how.

Right — this is not (currently) supported. I guess if you had wide tables of large values, this could be an important optimisation, but it hasn't been a need for me as yet.

2. In the `to_jsonb(authors)` example, what would you get back in the `author` field if there were multiple authors with the same `author.id` value? An array of `author.Selectable` objects? This part is awesome but brittle, isn't it?

Multiple authors with the same id isn't going to happen, since id is intended as a primary key, so I'd argue that the example as given isn't brittle. On the other hand, there's a fair question about what happens for many-to-many joins, and since my use case hasn't yet required this I haven't given it much thought.

gmac · on Sept 22, 2019

OK, I gave the one-to-many queries a bit more thought, and the converse join query (getting each author with all their books, rather than all books each with their author) works nicely with a GROUP BY:

    type authorBookSQL = s.authors.SQL | s.books.SQL;
    type authorBookSelectable = s.authors.Selectable & { books: s.books.Selectable };

    const
      query = db.sql<authorBookSQL>`
        SELECT ${"authors"}.*, jsonb_agg(${"books"}.*) AS ${"books"}
        FROM ${"books"} JOIN ${"authors"} 
          ON ${"authors"}.${"id"} = ${"books"}.${"authorId"}
        GROUP BY ${"authors"}.${"id"}`,

      authorBooks: authorBookSelectable[] = await query.run(db.pool);

This exploits the fact that selecting all fields is, logically enough, permitted when grouping by primary key (https://www.postgresql.org/docs/current/sql-select.html#SQL-... and https://dba.stackexchange.com/questions/158015/why-can-i-sel...)

I'll update demo.ts and README shortly.

timmy-turner · on Sept 21, 2019

Right, only querying a few fields seems not to be a builtin feature. Looks like you have to create the partial selectable type yourself and there is no support to typecheck that the correct columns in the select are included.

Your second case, if I recall this correctly (ActiveRecord made my SQL skills fade away), this plain JOIN would just return a row with the same book but a different author. `to_jsonb(authors.*)` is just operating on a single row. But what you want is possible (aggregating rows into a JSON object) by using `jsonb_agg`. Whether the lib supports inferring the correct typings for that is another question though.

GordonS · on Sept 21, 2019

> Just worried about forcing colleagues having to learn SQL instead of using a fancy wrapper

I'd argue that learning SQL is essential for any developer.

It's also a "reusable" skill that will stand them in good stead for decades - whereas learning how to use the fancy wrapper is only useful until the next new shiny comes along.

ljm · on Sept 21, 2019

I’d add that it’s essential so you can understand how to optimise and debug a query. You lose a lot of power if you can’t open up a console to describe or explain things.

The long-standing ORMs do a pretty decent job of writing efficient queries these days though. You can go pretty far without knowing much and that’s not a bad thing either.

victor106 · on Sept 21, 2019

This looks great. It looks similar to JooQ.

AtlasBarfed · on Sept 21, 2019

I'm not even someone that has used multiple orm styles extensively, but it is disturbing/darkly humorous how many orm libs there are.

That lastpost you can map a half dozen Java frameworks to each of the acts.

Personally I never found an orm that tracked which attrs in objects were actually mutated so that only mutated columns would be updated/inserted, but again I never did a lot of orm.

liquidify · on Sept 22, 2019

I really like this. There is only one thing that bothered me. I'd rather pass the job to the pool than pass the pool to the job... something like...

const existingBooks = await pool.exe(select("books", { authorId }));

or

const existingBooks = await pool(select("books", { authorId }));

moring · on Sept 21, 2019

QueryDSL (http://www.querydsl.com/) does something like this for Java. It can generate classes from tables, but even with those, all queries / statements that hit the database are manually built using a query builder to avoid syntax and type errors. I.e. no caching or automatic database updates.

nikolasburk · on Sept 21, 2019

knex.js is another example of such a query builder library in the Node.js ecosystem

nikolasburk · on Sept 21, 2019

Fully agree! Some mapping code is _required_ in your application, otherwise it wouldn't be able to talk to your database at all.

However, I've never understood why people write this mapping code manually. I believe in code generation tooling as a potential solution for this (where types and maybe a full data access API is auto-generated based on the database schema).

thymanl23 · on Sept 21, 2019

Yeah, ORMs have grown to mean more than mapping relational data to objects. An example of "just" this mapping can be seen in PureORM[0].

[0]: https://github.com/craigmichaelmartin/pure-orm

runeks · on Sept 21, 2019

> There is a big difference between just writing helper functions to construct SQL and convert data types, and OO-style magical auto-persisted objects. The latter is what I don't like about ORMs but the former is fine. I feel that this is an important distinction to make.

What's the big difference? Why do you like the former but not the latter? What are the characteristics of the former that makes it distinct from the latter?

kls · on Sept 21, 2019

Out of curiosity what platform and tech where you using? I am making the assumption of a predominately OO one based on the virtues of ORM. I have always found that when I try to solution back end or middleware based platforms with OO dominate languages (read Java, C#, et. al.) that there quickly becomes an impedance mismatch and any communication with the database becomes a monster of mapping OO philosophy to relational theory, whether that be via home rolled or an ORM.

That being said, I personally have found that I do not like OO languages for back end dev and I find that functional languages such as any variety of LISP marry extremely well to the transnational and process oriented nature of back-end systems as well as lend themselves to not having to jump thru hoops to contort the data into relational sets (Clojure's destructuring is an absolute life saver here). I find that there is little to no duplication of code in regards to transferring data to the db. You may want to give Clojure or F# (depending on your stack) a try for your back end and see if it does not alleviate a host of issues with trying to develop a process and transaction oriented system, which most back ends fit that definition.

I find the converse to be true for the front end. I find most attempts to deal with the UI in anything other than objects and components (read jQuery, React Hooks), turns to spaghetti rather quickly.

If you are using OO languages to communicate and transfer data to the DB you may very well be trying to solution for the impedance mismatch that is easily solved by using a functional language.

vikeri · on Sept 21, 2019

I can recommend hugsql to anyone who wants to work with SQL in Clojure. You basically write pure SQL with some minor templating helpers and then you get the data in a map with Clojure data types. Very nice and minimal overhead: https://www.hugsql.org/

zachrose · on Sept 21, 2019

> I find the converse to be true for the front end. I find most attempts to deal with the UI in anything other than objects and components (read jQuery, React Hooks), turns to spaghetti rather quickly.

What alternatives have you tried?

kls · on Sept 21, 2019

ClojureScript, Reflex, Grapefruit, Seesaw and a host of others, It's my opinion (so take it with a grain of salt) and it very well may be the way my brain works but I just find functional to not marry well to UI development. For the service and process oriented stuff associated with the front end I think it is great, but when it comes to modeling components, I find objects and inheritance work far better.

This is one of the reasons I have long been a huge proponent of Javascript despite it warts, as it can be OO when I need it to be OO and functional when I need it to be functional.

mecameron · on Sept 22, 2019

I've used ClojureScript with re-frame professionally for over 2 years, and it's been the first time in over 20 years of development that I've enjoyed working on the front-end. Unidirectional data flow with a single app atom has been a dream to work with.

JesseAldridge · on Sept 21, 2019

It was a Flask app using SqlAlchemy (so Python). I'm not sure functional programming would have changed the situation much. I imagine there would still be repeated patterns involving reading and writing to the database in slightly different ways, and it would still make sense to use some sort of library. But I haven't used functional languages much, so I can't say for sure either way.

didibus · on Sept 21, 2019

Well, the difference is that in a data oriented language, you do not need to map Objects to relations. You can get the data back in the relational format and use it as is in your app. So you don't need an ORM. You might still have a library to help you build dynamic SQL queries, but no object-relational mapping needed.

dragonwriter · on Sept 21, 2019

> Well, the difference is that in a data oriented language, you do not need to map Objects to relations. You can get the data back in the relational format and use it as is in your app.

You can do that in an OO language, too; relations (whether constant or variable, and whether there data is held locally by the program or remotely, as in an RDBMS) are perfectly valid objects.

_8ljf · on Sept 21, 2019

Query builders != ORMs.

A query builder carefully preserves the underlying relational and RPC semantics and exposes all of that to the user in an easier-to-use form. That’s just good cautious modest abstraction.

An ORM believes it knows way better than those dumb RDBs how a database ought to behave by ingeniously pretending that everything you’re dealing with is just nice simple familiar arrays of local native class instances. Which, like all lies, ends up spawning more lies as each one starts to crack under inspection, until the whole rotten pile catastrophically collapses under the weight of its own total bullshit.

And of course it goes without saying which of these approaches our arrogant grandstanding consequence-shirking industry most likes to adopt.

Mikhail_Edoshin · on Sept 21, 2019

But now you know it's not magic :)

Maybe the problem is not that you don't need a mapping layer, but because ORMs are obscure. And maybe they are obscure not because SQL is such a cursed spot, but because object-oriented programming ITSELF drift toward obscurity and magic. Don't you get the same feeling of obscurity about other libraries, e.g. web servers or clients? I often find the bare specs much clearer than (supposedly simplified) OO libraries that implement them.

olau · on Sept 21, 2019

Yes.

If you stick to using the ORM for what amounts to (mostly) just PODs, it's syntactic sugar that can really help readability.

vbezhenar · on Sept 21, 2019

Sometimes you just should live with duplicated code. It's OK.

cjfd · on Sept 21, 2019

No, it is not okay. If you need insert/update/select for every object/table that is way too much duplication. It becomes very irritative when the schema changes. There should be table meta data in such a case but one should also know what one is doing. Having no idea that under the water 14 joins are done is not a good situation either.

ABeeSea · on Sept 21, 2019

Well sort of. In my view, duplicate SQL chunks that have defined business logic should either be a new table/view or extremely well-documented with really rigid communication policies for changes.

For example, a company with many data analysts/scientists who may each be writing their own queries. As a basic example, the definition of some “very important” company metric changes, then there would need to be a large number of disperse queries to change.

But an ORM isn’t the answer for the above situation either.

lugg · on Sept 21, 2019

It's relative, if the duplication is that large maybe you do need to abstract.

It also sounds like you would be well served using a service abstraction at that point to remove the data layer from client scope entirely.

The "model changes, now we have to change it every where" isn't going to be solved by abstraction, it's only limited by the amount you're willing to limit access to the underlying model, if you need that information, you need to share the model.

The best solution to this I've seen in practice is domain modelling, colocating shared code near other users. When things get too distant you start using anti corruption layers which allows more flexible model changing.

But at the end of the day this is essential complexity, orm, or any other solution is never going to be able to hide the fact that you need information elsewhere in the system to be useful.

kieckerjan · on Sept 21, 2019

Thank you! You just said something I very much agree to but never dare to say out loud.

__MatrixMan__ · on Sept 21, 2019

I think that the way to say it without starting a war is to preface it with something like:

> Well, redundancy and dependency both have downsides, but in this case...

dymk · on Sept 21, 2019

But you shouldn’t live with a hand-rolled pseudo ORM that stumbled into existence when there’s developed alternatives

lugg · on Sept 21, 2019

Why not?

I've used hand rolled pseudo ORMs before.

I prefer just plain SQL but for the application I had there was a common access pattern that was worth abstracting out in a DRY sense.

That doesn't mean I want or need a complete ORM. Just a consistent access at certain table types.

scarface74 · on Sept 21, 2019

I’ve never seen a homegrown ORM that was better than a third party one. Whenever there is an issue - and there are always issues - you have to dig into the code, because they are never documented well.

There is usually a feature that no one thought about and then you have to make modifications to the custom ORM and you get an even bigger mess.

lugg · on Sept 21, 2019

Better is a subjective term.

I wouldn't say what I built was a better ORM. But I would 100% say it's a better solution to the problem I faced.

It didn't get in the way of writing SQL, but it did reduce the boilerplate and repetitive gruntwork.

"Issues" - no, it was a function call deep and incredibly clear and close to what was happening.

LeonB · on Sept 21, 2019

sort of agree but also: the good third party ORMs started life as homegrown ORMs.

collyw · on Sept 21, 2019

Usually third party stuff has documentation. Home grown stuff often hasn't. And usually better tested.

scarface74 · on Sept 21, 2019

Entity Framework didn’t.....

notyourday · on Sept 21, 2019

You just said "you know, if you wanted to have a shitty burger, you can get it right there for half the price of a national chain and it will be at least as good"

mbell · on Sept 21, 2019

> I've used hand rolled pseudo ORMs before.

Is your pseudo ORM as well documented as a commonly used ORM? Can I google (or search your wiki) for common issues?

pknopf · on Sept 21, 2019

There's a middle ground.

Micro ORMs.

chii · on Sept 21, 2019

A micro ORM is just an ORM, written well and modularly. It isn't a middle ground - it's choosing to use a well written library.

A lot of people conflate ORM's leaking because of poor designed library with ORM being a bad abstraction in general.

scarface74 · on Sept 21, 2019

One of the most popular Micro ORMs for C# is Dapper which is used by Stack Overflow.

There is no real abstraction. You write standard SQL and it maps your recordset to an object. You know exactly what code is running.

There are extensions that will take a POCO object and create an insert statement and I believe updates, but where ORMs usually get obtuse and do magic are Selects. It’s hard to generate a suboptimal Insert or Update.

ct520 · on Sept 21, 2019

So.. pattern I see emerging. Use orm for the common stuff and execute sql for complicated queries (like reports)

JohnBooty · on Sept 21, 2019

Ruby on Rails’ ActiveRecord, for all its heft, is excellent at this. You can use raw SQL any time you like. It was an explicit design goal from day 1.

There are times I dislike things about it, and it can be quite heavy, but it’s very easy to mix and match ActiveRecord ORM code and raw SQL even within a single model class.

Trisell · on Sept 21, 2019

That’s how I’ve done it on my last two projects. We used TypeORM for the standard repeated simple queries, and then wrote custom SQL for our complicated queries that the ORM failed at and then just executed them with the ORM. It was really nice and made for easier table refactors because we didn’t have to go through and audit every query that was calling that table.

Rapzid · on Sept 21, 2019

TypeORM is a step in the right direction for JS ORMs bit it's like 1/8th of the way there IMHO. A nearly fully typed ORM is possible now with Typescript and of course proxy's are out now.. TypeORM was doing too much ADHOC string building under the covers as well. I believe a SQL AST is the way to go. It can be transformed and compiled to database specific SQL allowing for things like predicate push down, optimization, and a sane way to implement db specific optimizations and extensions.

nikolasburk · on Sept 21, 2019

I really like your point about type-safety! I think one major issue with the current ORMs in the Node.js/TypeScript ecosystem (Sequelize, TypeORM, ...) is that they're not fully type-safe. As you mention, TypeORM is definitely a step into the right direction here, but the way how it leverages the TypeScript type system isn't where it could be. I work at Prisma where we're building [Photon.js][1], an auto-generated database client with a fully type-safe data access API.

[1][https://github.com/prisma/photonjs]

leosarev · on Sept 21, 2019

Actually, DDD recommends that domain object in general should be loaded by id only, and complicated query for grid should load projection

scarface74 · on Sept 21, 2019

And for ETL and data loads. An ORM isn’t going to usually do a multi insert statement or an update that involves more than one table.

DougBTX · on Sept 21, 2019

The informal definition I have of a micro ORM is an ORM without an identity map and without lazy fetching through proxy properties. Are there any more concrete definitions?

sjwright · on Sept 21, 2019

> There's a middle ground.

> Micro ORMs.

And there's the mystical fourth option of simply not bothering with objects in the first place. No objects, no need to do object-relational mapping.

goatlover · on Sept 21, 2019

So then you’re mapping to whatever data structure your app uses instead of objects. In OOP languages like Python, everything is some type of object anyway.

sjwright · on Sept 22, 2019

That’s just a semantic game. If your language returns the result of a query as a generic array of generic dictionaries (or whatever), that isn’t mapping, nor is it object oriented in principle.

goatlover · on Sept 22, 2019

But then your generic array of generic dictionaries needs to be mapped to whatever data structures make sense for your application. ORMs save you that step.

sjwright · on Sept 22, 2019

I've written many apps and I've never experienced what you've described. I write my queries to return exactly the right data in exactly the right format in exactly the right order—so I can go straight from the generic data structures to the screen interface or document layout.

Nothing says the structure which make sense for an application can't be a generic array of generic dictionaries.

If your favourite programming language forces you to go through the silly hoops of data mapping in order to do useful things with query output, I can understand why an ORM might make sense for you.

avereveard · on Sept 21, 2019

also implicit row mappers so one can still do the queries manually without writing all the oo glue

rmilejczz · on Sept 21, 2019

An in house solution is almost always better than an external dependency

inimino · on Sept 21, 2019

This is correct. An in-house solution is a solution developed in-house for your specific problem, which no one else has ever had exactly. The more specific the need, the more the benefit of the made-to-measure solution. The alternatives are something your organization didn't develop, which may be better, but you don't know how to use it, or may be worse, but you don't know that when you pick it, or may be slower, but you don't know that when you start using it, or may have vendor lock in, but you don't know that when they sell it to you as "open", or may have hidden pitfalls, but they aren't in the glossy brochure, or may be unmaintained by anyone except your org in ten years, but you can't know that until ten years from now, or may be full of security holes because it was developed by idiots, but you can't know that because you didn't see who wrote it, or might be full of solid security features and a great design cleverly compromised by a hidden flaw placed in a specification you haven't read by a nation state, but you don't see that because why would you, or... etc etc etc. <sarcasm>But don't worry, at least you didn't have to understand the problem space well enough to be able to sit down and solve it yourself, so you sure saved some effort there!</>

elsjaako · on Sept 21, 2019

Isn't this an argument against using any library at all?

inimino · on Sept 21, 2019

Yes!

Which bring us to the topic of tradeoffs and the synthesis of balance, by way of weighing competing advantages and costs fairly.

On the one hand, code you must write and understand. On the other, code someone else wrote, that you can just use. There is no clear winner here. It's always a tradeoff.

paulmd · on Sept 21, 2019

Haha, so enterprise-chat.

I rewrote our entire database layer in Hibernate for our (incredibly complex monolith) webapp. Then I was tasked with rewriting a major core piece of search functionality that builds a query from user selections / saved queries.

I was told that contrary to previous work, Hibernate Criteria would not be allowed, since it was deprecated. Hibernate's official replacement for programmatic queries is JPA Criteria, but Hibernate's support for this was not feature equivalent to Hibernate Criteria, so this was out too.

So what I got the green-light to go on was rewriting my own pseudo-ORM wrapper that generates HQL query strings and parameters. Hql is not deprecated, you see.

It's ended up working out moderately well, it's a thin layer and as long as you avoid the rough edges it actually works fairly well, as well as providing a convenient point to translate query language from the old kodo format into hibernate (cringe, code smell).

There have been times I've had to do some very awkward query shit that I've only managed to lever in via HQL. You have no idea, views on top of views.

No idea what'll happen after I leave, that's their problem!

Thanks for the job security, Hibernate team. Your incredibly-poorly-executed transition from a well-supported standard to the "new new" has been exquisitely great for my job security.

Bet there's Python3 devs who feel the same way!

lol768 · on Sept 21, 2019

>JPA Criteria

Feature parity aside, I also found this to perhaps be the most verbose API for query building I've ever used.

nullspace · on Sept 21, 2019

Hah... I’m probably falling for Poe’s law here, but anyways... there are certainly cases where in-house is better than external dependency - specifically when your team knows the tech domain better than anyone external can... but in general well-maintained (preferably open source with a community, or a well funded company) external dependencies are almost always better. They usually would have the years of fixing edge cases and features that you would inevitably run into if you were to roll ur own.

rmilejczz · on Sept 21, 2019

They’re also tailoring their solution to be as generic as possible.

Having written OSS and also having written enterprise applications, it seems plainly obvious to me why a homegrown solution is preferred. Code developed internally is understood by the team (you may not understand the underlying implementation of a dependency), and can be tailored exactly to suit your needs (ignoring edge cases that aren’t relevant, removing unneeded features). And you never have to worry about maintainers disappearing, breaking changes being introduced, or bugged releases that you can’t do anything about.

I don’t mean to sound crass but how on earth could you think this is an example of Poe’s law? What’s so extreme about being a responsible developer? I didn’t say “every solution should be developed in house” (though I think most large projects would be better for it!) obviously there’s is a cost associated with in house solutions and you should gauge that cost to see if it’s worth it for your application. But if you’re going to be working with that application for years and years to come then I highly recommend trying to write your own code instead of relying on libraries.

nullspace · on Sept 21, 2019

To counter your arguments, I'm going to use a couple of typical examples of when an in-house versus open-source / external debate comes up. I'm not counting the infamous "leftpad" cases, those are usually trivial, and really don't matter in the grand scheme of things. If it's a one-liner, just implement it yourself.

1. A high level database or queue lib, or a custom / powerful serialization lib or, relevant to this topic) an ORM or other foundational/low-level part of your tech stack.

What you can expect to happen is a bunch of very good programmers early on build powerful abstractions using macros, metaprogramming, advanced type system concepts and build up a codebase adding up to a few thousands of lines. It just works, it's a good system - a few bugs are patched by the team every month, but that's fine. Fast forward a few years, the programmers have moved on, "onboarded" the rest of the team to the codebase during their respective last week, but given how complicated the codebase is no one is really capable of debugging it and fixing issues. And given that it's not open-source, it never got an opportunity to build a community of contributors. Your team is now SOL, and it's going to take _months_ to replace it with a more well-maintained open-source solution.

2. Building a A/B testing implementation in-house - again a couple of good programmers build a working, scalable, basic system in a weekend. It actually works and the code is good, simple, readable and well-tested. But then, your PM team or your Marketing team wants you do add graphs. Then export the data to RedShift. And then tweak the algorithms powering the backend. Then multi-arm bandit. And so on. Now, what was now a weekend project, turns into months of work - whereas there exist paid services that do this for you.

Sometimes, it's unavoidable, external alternatives are genuinely not good*. But I strongly think, you have to be very, very careful about building systems in-house when they are not your business.

> I don’t mean to sound crass but how on earth could you think this is an example of Poe’s law.

Sorry for this. I do feel quite strongly against your original comment (at least the way it was written without context), and I think it's the _opposite_ of being a "responsible developer" in all but edge cases, and think you are wrong. But calling it an example of Poe's law was not right on my part, and was harsh.

> But if you’re going to be working with that application for years and years to come then I highly recommend trying to write your own code instead of relying on libraries.

I've done this, and have done both in-house and oss code, but in-house, very reluctantly - for example - when there's just one or two maintainers committing code, and there's no alternative. But even then, I have usually forked the code and used that or parts of that as the base, rather than starting from scratch

scarface74 · on Sept 21, 2019

There is hardly ever a time where an in house solution is better than a third party one for cross cutting concerns. Most of the packages are open source.

tomc1985 · on Sept 21, 2019

Idunno man, my day job working Rails code uses a custom mailer and job queueing system and everytime I have to work with it I really wish they'd used ActiveMailer and ActiveJob

rmilejczz · on Sept 21, 2019

Like I said, “almost” always. Really the larger the application and the longer time you as a dev will work with it, the more meaningful it becomes to write your own solutions.

It’s really a balance, but I don’t think it’s a balance most devs consider and they really should.

Ace17 · on Sept 21, 2019

While I agree with you, I'd like to point out there are interesting exceptions: software components that can never be "complete". Such components require permanent maintenance workforce, and you might not want to dedicate resources for this.

Such as:

- API abstraction layers (like SDL, Allegro, SFML, etc.): you want to support new operating systems / new APIs by default. And most of the time, you don't want to spent time learning about the specifics of X11 window creation or Win32 events, as this would be throw-away knowledge anyway.

- hardware abstraction layers: you want to support new hardware by default, this is why we use operating systems and drivers.

- Format/protocols abstraction layers: if your game engine only uses JPEG files directly coming from your in-house asset pipeline, it's perfectly fine to develop in-house loaders (from scratch or from stb_image). But if your picture processing command-line tool aims to support every file format (especially, the ones that don't exist yet), then you should rather go with an updeatable third-party library, which will allow you to get all new formats by default.

- all kind of optimizers, including compilers, code JIT-ters, audio/video encoders, etc. More generally, all code that uses some heuristic so try to solve a problem that's not completely solved/solvable. You might be ready to accept the performance of a specific version of, for example, libjit. But you might instead consider that in your case, not having state-of-the-art JIT performance might be detrimental to your business, in this case you want to get the performance enhancements by default.

collyw · on Sept 21, 2019

Lack of testing, lack of documentation and lack of use would be reasons that your claim is usually untrue. You can't Stack Overflow a problem and see if anyone else has encountered it before.

rmilejczz · on Sept 21, 2019

This is a terrible reply, what are you even trying to say? You can’t stack overflow a problem so don’t write your own in-house solutions? Lack of testing? We write our own tests. We write our own documentation.

It’s crazy to me how many people on HN are ignorant to the costs of third party dependencies and the benefits of in house solutions when building large applications.

collyw · on Sept 21, 2019

I am trying to say that most of the home grown solutions I have seen have been pretty poor quality, and lack documentation especially. Do enough maintenance programing and you will understand.

If you do test and document your own stuff properly you are in a small minority. Why not release it for others to use?

SkyBelow · on Sept 21, 2019

In house means more customized to the specific problem but with far less expertise in the general technology. I find the latter almost always outweighs the former when working at any cost center tech shop.

goatlover · on Sept 21, 2019

Otherwise known as NIH in steroids.

dymk · on Sept 21, 2019

Opposite; an in-house solution is almost always worse than an external dependency when that dependency is something as important to get right as an ORM.

benjaminbrodie2 · on Sept 21, 2019

The best of both worlds is write your own universal preprocessor...

beagle3 · on Sept 21, 2019

The problem in many cases is actually in the OO part, in my experience - in the vast majority of cases where databases and persistence is concerned, staying in the procedural/structured + relational world keeps things simple, whereas objects often obscure what is actually happen, and invoke opaque magic such as ORMs.

I wonder what your experiences had been if after dropping the ORM you had gone one step more and dropped the objects.

sjwright · on Sept 21, 2019

This.

After contemplating my distaste for ORMs more carefully, I've come to the realisation that my objections aren't so much to do with the concept of an ORM but rather object orientation itself—and the fetish of treating it as the perfect hammer for every nail.

For the projects I've worked on, I've almost never wanted to turn data into objects. And on the occasions when I've thought otherwise, it usually turns out to be a mistake; de-objectifying can often result in simpler, shorter code with fewer data bugs.

Ultimately, the right answer depends on the nature of your particular business logic, how data flows in your wider ecosystem, and pragmatically, the existing skills of your workforce.

stefanos82 · on Sept 21, 2019

Hahaha! ^_^

Seriously Jesse, isn't this the very same reason __some__ people end up implementing yet another programming language without realizing it?

First they start out of exasperation with X language they use, because they hit some obstacles or limitations, and before they know it they end up implementing a newly created language.

You know what's the fun part? In their attempt to fix the aforementioned language's issues, they end up introducing __the very same problems__ in their own language, only under different "cloak" so to speak.

It's a vicious cycle I'm afraid...

mruts · on Sept 21, 2019

So are you suggesting that we can’t do better than assembly machine code?

jdsully · on Sept 21, 2019

SQL is great if you will have multiple applications looking at the same dataset. E.g. An employee management program and a payroll program. In this case you should design a sane schema and mold the app around it.

ORMs are terrible in this sort of world since they tightly couple the application to the data. But if you will only ever have one application anyway the abstraction of a separate schema is pointless.

erik_seaberg · on Sept 21, 2019

A lot of people who believe only one app (or one language) accesses their org's datastore are mistaken. You have to take extreme measures to prevent ad hoc uses from popping up.

philwelch · on Sept 21, 2019

Yes, yes, yes.

Why is this the case?

1. If you are doing anything interesting, people are going to ask questions about what you are doing, and the best way to answer those questions is going to be by querying your database.

2. One day you might want to rewrite some of your service/s, split them into microservice/s, etc. At that point, there will be a minimum of two services talking to your datastore: the legacy service and whatever you're replacing it with. I suspect any alternative to this arrangement will be an even worse idea, e.g. taking a deliberate outage to perform a likely-irreversible migration.

zbentley · on Sept 21, 2019

> One day you might want to rewrite some of your service/s, split them into microservice/s, etc. At that point, there will be a minimum of two services talking to your datastore.

You should not do this. It removes almost all of the benefits of extracting things into a separate service (services should own their data and the only means of accessing it should be via their APIs). That's not utopian; that's one of the main reasons you do a service extraction in the first place.

philwelch · on Sept 21, 2019

Right, so let’s suppose you already segmented the data to two different backing datastores, and your monolith is now connecting to both of them instead of just the one. Now you can do the service migration, at which point you still run into the situation I’m discussing.

zbentley · on Sept 21, 2019

Cutovers are hard, to be sure. Ideally they should also be short (the time time a service undergoing mitosis spends talking to the old and new locations should be measured in days or hours or less).

Don't choose general data access patterns for the infrequent occurrence of cutover. Cutover is when you break a few rules and then immediately stop doing so. Build for everyday access patterns instead (which should be through the API of whatever owns the data--SQL is a powerful language and a really shitty API).

philwelch · on Sept 21, 2019

Stored procedures are a better API than arbitrary SQL. You may even be able to enforce it by granting EXECUTE permissions but not SELECT permissions.

vips7L · on Sept 21, 2019

The simple solution to 1 is to never allow direct database access. Api only.

pnako · on Sept 21, 2019

Of course. But surely you don't let anyone access your API, and you put it behind another API, right? Just in case you need to change that first API without breaking all the users.

xwolfi · on Sept 21, 2019

Never even tell you have one, else the founder will pat on the back of one of your most junior dev and ask if he can give access to the db to that other team who needs to make money :D

philwelch · on Sept 21, 2019

So you do all your analytics by running a series of service calls and then writing a script to collate them into the needed results? Seriously?

zbentley · on Sept 21, 2019

I'm not the GP, but yes, absolutely. There are plenty of things that make this less than awful:

- The existence of tools that allow structured access to multiple APIs (GraphQL is a nice middle ground between "YOLO any queries you want" and "you only get row-by-row access exposed by the web APIs").

- The existence of data on multiple internal data stores. Analytics folks usually are not prepared to engage with the complexity of data being stored across handfuls or more of different stores with different schemas. The owner of the application knows how to join that stuff better than they do.

- Building intermediate/denormalized stores isn't frowned upon just because analytics shouldn't run ad hoc queries on the main production DBs. Expose change streams or bulk ("too much" data) endpoints and make it easy to load their results into a reporting system, which can be raw SQL. It's not redundant; if you don't do this, the following conversation starts to happen often: Q: "I'm running raw analytics queries on production and it's not quite working, can we just make $substantial_schema_change so my report works/is fast?" A: "No, we explicitly chose not to structure the DB/index/whatever like that because it seriously fucks up a real user access pattern."

philwelch · on Sept 21, 2019

Forcing analytics to go through the API doesn’t actually reduce load on the production DB, it just increases load on the API itself. Step 1 should probably be a dedicated read replica and step 2 should probably be an ETL process.

yowlingcat · on Sept 22, 2019

Ding ding ding. Dedicated read replica and an ETL gets you to a point where queries don't bring down prod. If you have an analyst org running wild making bad decisions about data that they think says things it doesn't -- that's probably a good sign that it's time for a dedicated data engineering team, and potentially a BI flavored data science team as well.

zbentley · on Sept 23, 2019

Analytics queries bringing down prod seems . . . pretty amateur hour. I'm more interested in whether or not analytics queries actually get the data they're interested in when they want it. The reporting team is likely not better versed in what means what than the developers who work on the application databases. What about multiple internal DBs that reporting wants to analyze as if they were one? What about schemas that change over time, obsoleting the analytics team's assumptions? Reliable, versioned data access APIs address both of those families of problems. Yes, it's harder than "YOLO query prod". It also works for longer without breaking, and jives with the scale out plan (usually discrete APIs, sharding, and then maybe microservices and more families of APIs if you're mature enough).

yowlingcat · on Sept 24, 2019

> Reliable, versioned data access APIs address both of those families of problems.

They only address it in so far as they push it downstream to the analyst, who as you mentioned, "is likely not better versed in what means what than the developers who work on the application databases."

There's a reason why datalakes exist, and having used them at past N companies, I think this is why data engineering of the BI flavor becomes necessary at the point that reporting becomes critical. An API is strictly worse than a datalake, and it's not hard to set up and maintain the latter. API versioning and communication are for frontend integrations, paid partner integrations and potentially (although I'd probably lean more on gRPC and the ilk) microservice to microservice interactions. But, I like to avoid building unnecessary API surface when I can.

troxwalt · on Sept 21, 2019

What do you mean by this? What other way , other then digging right into the data is there to access the database? Isn't it all through APIs?

dkersten · on Sept 21, 2019

I like Clojure’s HugSQL[1] for this reason: you can simply write raw SQL, but when you start duplicating code, you can start factoring those bits out into composable “snippets”. The best of both worlds: composability and reuse, while still writing raw SQL.

[1] https://www.hugsql.org/

rocho · on Sept 21, 2019

I have the opposite view. I find ORMs annoying and obscure, and I think they introduce duplicated code.

If you need to run a certain query in multiple places, you need to repeat the same ORM expression or refractor it into a function. I find much better to have a module with all my SQL queries as strings. That way whenever I need to run a query I reference it from there. Of course it helps to use meaningful names.

This approach has a lot of advantages over ORMs: * you know exactly what gets executed * automatic DRY code * the names of the SQL queries in the code are self-explanatory and the reader doesn't have to parse the ORM expression every time

Schema definitions are in standalone SQL files, as well as my migrations.

The only disadvantage is that it may be difficult to switch to a different database system, but that is not a problem for us.

aidos · on Sept 21, 2019

Your argument does not stand up. I can have a file full of ORM sql fragments the same as you file of strings. And I can compose mine together safely and more flexibly than strings.

scardine · on Sept 21, 2019

Have you tried Python's SQLAlchemy, the ORM parent posts are praising? The `sqlalchemy.sql` module is awesome and pretty much maps 1:1 to raw SQL.

Composing SQL expressions using this library instead of using string interpolation/concatenation has several advantages:

* DRY and composition * safety * portability (if you have switch the underlying DBMS)

Often the result is as good or better than my raw SQL. The fact that Python has an amazing REPL makes the process pretty much like testing queries in the database prompt but with less cognitive switch between languages.

In the end it is a matter of taste, but I have to agree with parent posts, SQLAlchemy raises the bar for other ORMs.

dsego · on Sept 21, 2019

> I find much better to have a module with all my SQL queries as strings.

But you can't compose them, so there is a lot of duplication. Also, how would you handle dynamic filters and columns? Concatenating strings? That seems error prone. At least a nice query builder would be useful, but then the whole just write sql thing falls apart.

revscat · on Sept 21, 2019

Hahaha, wow! That’s just about the most awful thing I’ve ever heard. If you ever find yourself sitting across a table from an interviewer, I would definitely recommend not including this little tidbit in the conversation.

arendtio · on Sept 21, 2019

ORMs seem to be a typical example of over-engineering. Often you don't need all that complexity they come with and when you do, you are probably better of understanding exactly what you are doing.

So maybe building a minimal API, wrapping your SQL queries isn't such a bad idea after all.

thymanl23 · on Sept 21, 2019

The things I've found positive about ORMs are exactly that mapping of results to business objects. The things I've found "not worth it" are the query-building APIs baked into the objects. These principles can be seen in a lightweight ORM I made, PureORM [1].

[1] https://github.com/craigmichaelmartin/pure-orm

takeda · on Sept 21, 2019

> and handle mapping data explicitly.

Here's your problem

inimino · on Sept 21, 2019

There is a lot of ancillary complexity in database connection libraries that we could attack before replacing the standard structured query language by some poorly considered mapping of objects to and from relation(s), inspired by poorly understood bad old OOP, which is generally what all ORMs boil down to.

CraneWorm · on Sept 21, 2019

you could consider something like slick onstead of an ORM:

http://slick.lightbend.com/doc/3.2.0/orm-to-slick.html

inanutshellus · on Sept 22, 2019

This is exactly what MyBATIS is for.

You throw in SQL, provide a simple mapper, done. IMHO it's far superior to ORMs when your database is or may become complicated.

_the_inflator · on Sept 21, 2019

It is striking the balance between your own queries and ORM.

My rule of thumb is that I always go with ORMs for MVPs and small apps. Optimizing for speed usually means going deeper and building a system or queries for yourself. Until that point I usually stick to less verbose code and more to business rules.

zug_zug · on Sept 21, 2019

Views are another way to gain reuse.

akho · on Sept 21, 2019

As are stored procs, user-defined functions, triggers, scheduled jobs,.. SQL databases are programming environments, not pure datastores.

baq · on Sept 21, 2019

Most are exceptionally bad at being programming environments, though.

seunosewa · on Sept 21, 2019

> I started coding some simple functions to help map the tabular data to objects.

Maybe you should not do that? Can you give us an idea of the domain problem you were trying to solve that made you feel the need for that?

iamsb · on Sept 21, 2019

I use jdbi.org in Java all the time because it does just that for me.

0x445442 · on Sept 21, 2019

MyBatis

725686 · on Sept 20, 2019

Each time I see someone complain about ORMs I remember Greenspun's tenth rule[1], which adapted to ORM would be:

"Any sufficiently complicated program contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of a decent ORM."

ORMs are hard for a reason. Using an ORM doesn't mean you can't or shouldn't use plain SQL where the situation calls for it. You can mix and match perfectly fine.

[1] https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule Edit: typo.

geodel · on Sept 20, 2019

To me it is more of as Ted Neward describes "ORM is Vietnam of Computer Science"[1]

"Although it may seem trite to say it, Object/Relational Mapping is the Vietnam of Computer Science. It represents a quagmire which starts well, gets more complicated as time passes, and before long entraps its users in a commitment that has no clear demarcation point, no clear win conditions, and no clear exit strategy."

[1] http://blogs.tedneward.com/post/the-vietnam-of-computer-scie...

mapgrep · on Sept 20, 2019

Ya I’ve heard this one a lot. It’s kind of funny to say and does humorously underline the complexity of the problem but people take it seriously. So to take it seriously for a second:

There was no good reason to be in Vietnam; even taking the stated rationale as a given, which many people did not, it was a concern many levels removed from the actual safety or functioning of American society.

ORMs in contrast achieve much more proximate goals — they solve a real problem and can measurably reduce the amount of code you have to write. It is tedious to write the same sort of SQL query over and over. Even if you prefer to use literal sql for the complex stuff (as I do), ORMs tend to be a significant win (in fewer LOC to write) on abstracting out basic queries.

setr · on Sept 21, 2019

Tbh you could easily claim that the explicit goal, mapping to objects, is incorrect.

The real value is to reduce the damage of the SQL language itself — the unnecessarily ordered clauses, the arbitrary inconsistencies in syntax, the worthless parser errors, the lack of any static typechecking — which cause so much code bloat and debug headaches.

There are two reasons to use the ORM: to not learn SQL, and to generate SQL.

The first reason is the commonly provides one, and what leads us into vietnam. The latter is why people try to avoid ORMs, yet find themselves back in vietnam.

What we really need is a less shitty version of SQL.

bitexploder · on Sept 21, 2019

I have found the best ORMs don’t hide their SQLness much. SQLAlchemy is pretty great, but you don’t get full use out of it unless you have your arms around SQL itself. When you use an ORM to cut down on chores it’s great. When you use an ORM to avoid your datastore and it’s idiosyncrasies it is worth taking a long hard look at why :)

Most of the ORM interactions are well formatted code that don’t hide the datastore much. As long as you let the ORM map in objects and it’s performant, life is alright. When the ORM starts running the show there may be no coming back.

mapgrep · on Sept 21, 2019

I enjoyed this comment because I only I only recently tried Sqla for a project and agree fully that it embraces sql, in large part by NOT renaming/rethinking things at the oop level - methods very much tend to be named after sql verbs. I really liked this.

What I liked less was all the setup/config ceremony. Compared to ActiveRecord (the Ruby lib not necessarily the orm concept) I was using more LOC before I got to the part where I started saving time on simple queries. I realize this is because sqla uses the datamapper model. But for me thif sweet spot would be auto setup like AR with sql-like syntax of sqla.

bitexploder · on Sept 22, 2019

Yeah. When you get good at composing SQLa code it is really nice. You can functionally build your queries and DB interactions. It just fit well with my way of coding. ActiveRecord had a lot of magic to my taste, but you could still drop down with it. Agree SQLa can be tedious at first. No tool is perfect :)

cturner · on Sept 21, 2019

    > What we really need is a less shitty version of SQL.

My view is the opposite. The power of SQL perpetuates a low-quality software culture. The root issue is a dev culture that can't see past databases.

A lot of software design runs like this: (1) translate business patterns into a relational schema; (2) build interactions with that schema; and (3) as that gets harder, use SQL arcana and ORMs and views and stored procedures to squeeze out flexibility.

I worked like this for the first decade of my career. My systems got some use, and struggled along, but they are failed projects.

Repeatedly I had this feeling: the project is almost done, but there are some concurrency issues where I would not even know how to start addressing them.

The database-centric design made it impossible to work past that.

After much searching, I came to this: Stored State is a brittle and unwieldy thing, and you want as little of it in your life as possible. The more you have, the harder you have to work to get anything done. Databases are an institution of Stored State.

As an alternative, you can derive state from messages.

wwweston · on Sept 21, 2019

What kind of concurrency issues were you running into with a RDBMS that has atomic transactions?

Are we talking about distributed store situations where CAP theorem limits apply, or something else?

collyw · on Sept 21, 2019

I see the same problems but it usually caused by a reluctance to modify the database schema. Once you get comfortable doing that and keep your schema matching the requirements then it solves a lot of the problems.

inimino · on Sept 21, 2019

> As an alternative, you can derive state from messages.

Bingo!

Unfortunately, if people won't even listen to Alan Kay on this stuff, who will they listen to?

dsego · on Sept 21, 2019

What is the solution then? Are you talking about event sourcing or something different?

mst · on Sept 21, 2019

An ORMish thing that considers its primary purpose to enable metaprogramming SQL is actually useful.

In http://p3rl.org/DBIx::Class perl has had such a thing for over a decade now.

It makes me cry that nobody's ever adequately cloned it into other languages. Eventually I'll probably do so myself.

weberc2 · on Sept 21, 2019

Isn’t that just a query builder? I’m not familiar with Perl or your library, so maybe you could tell me where I’m wrong?

mst · on Sept 21, 2019

What I was trying (and evidently failing) to say is that a sufficiently powerful query generation system that's designed for people who actually like their database to be able to generate the exact SQL they would've written by hand is essential to the 'mapper' part of the equation being able to smooth out any impedence mismatches between the appropriate object model and the appropriate database schema.

Hopefully that longer answer is a bit clearer than my first attempt.

misterdoubt · on Sept 21, 2019

I guess I don't know enough Perl to understand how this differs dramatically from something like SQLAlchemy in Python.

mst · on Sept 21, 2019

SQLAlchemy is pretty close. Been quite a few years since I've had a chance to get drunk with the authors and compare notes though, so I'm not going to try and get into details because I'm pretty much guaranteed to get some of them wrong.

noisem4ker · on Sept 21, 2019

Something like jOOQ?

https://www.jooq.org

Too · on Sept 21, 2019

That looks fantastic. No magic mumbo jumbo mapping, just a simple type safe sql. Both syntax safety (no need to remember which of WHERE and HAVING comes first) and type safety on all fields. It's not advertised in the examples on the front page but i also take for granted sql injections are completely impossible since all data goes into functions and are not string formatted, without the mess of having to remember the order of arguments as with prepared statements.

Anyone got tips on similar frameworks for other languages than java and for other dbs.

xadoc · on Sept 21, 2019

Dapper for C# made by StackOverflow team

https://github.com/StackExchange/Dapper

Noumenon72 · on Sept 21, 2019

Ye. Static typechecking is the only thing in his list that I really care about, since you can "git gud" at SQL and not be bothered by the syntax/ordering/parser concerns. jOOQ is exactly what I want to bridge the gap between Java and the DB.

setr · on Sept 21, 2019

The ordering is always a problem, because your logic may not follow it. Eg if your set of conditions apply to multiple queries, then you might know your where conditions before you know your select/from clauses.

So instead of building up your sql string in a straightforward fashion, you need to have at minimum an abstraction that delays construction.

You get lead into vietnam as almost a direct result of SQL’s context-sensitive clauses.

spion · on Sept 21, 2019

Or you could go with LINQ's approach from (x) where (y) select (z).

Honestly, I think LINQ and entity framework successfully solved most ORM concerns

mamcx · on Sept 21, 2019

> What we really need is a less shitty version of SQL.

I think the same. In my spared time I building a relational language (http://tablam.org ..accept more help!) starting even lower: Without a rdbms.

I work in the past with FoxPro, and was possible to build a full app with UI and reports and all stuff you can imagine with a database-oriented language. You code the UI on fox, query with fox, make triggers with fox, etc...

I'm looking in capture the same essence.

dragonwriter · on Sept 21, 2019

> What we really need is a less shitty version of SQL.

Any of the D (Date and Darwen's, not Digital Mars’s) family of relational languages might fit the bill.

SomeOldThrow · on Sept 21, 2019

> ORMs in contrast achieve much more proximate goals — they solve a real problem and can measurably reduce the amount of code you have to write.

Why would you optimize for LOC rather than expressing the behavior you want well? In some situations sure—rapid prototyping—but that’s an odd assumption in a general case.

jacques_chester · on Sept 21, 2019

If you haven't read that post, it's worth reading in its entirety. The title is not the entire argument.

yawaramin · on Sept 20, 2019

> ORMs are hard for a reason.

Yes, the object-relational impedance mismatch. It's the classic case of having a hammer (OOP) and trying to make everything look like a nail.

delusional · on Sept 20, 2019

I would actually let OOP off the hook here. I think what did the harm in this case was the java generation. The generation of programmers that were told that in the future they would only have to write the "business logic", and everthing else would just happen. They were taught javabeans, orms, gigantic frameworks. They completely forgot that their code actually needed to execute, and no one cared about their "business logic" if the application didn't do what it was supposed to do.

This generation has only ever used ORM's, and so to them those tools must solve some hard problem, they are so complex after all. SQL must be hard.

It turns out that SQL is actually much simpler than ORM's. The failure modes are much simpler, and the implementations more robust. Sure, writing the code can be tedious, but tedeious is not hard. Writing brainless code every once in a while gives you time to reflect on the design of your system, and think about the larger context.

revscat · on Sept 21, 2019

This is the polar opposite of my experience. Not to mention that you undercut your own argument by admitting SQL’s tediousness: tedious code tends to lead to more tedious code, as subsequent developers fear breaking something, so they just add a layer on top of it, rather than addressing root causes. So that SQL view you had now has 10 inner joins in it, a union, and is being used by five other views now, each with their own similar levels of complexity. Trying to do any refactoring on this is almost humanly impossible, because you can’t keep all of it in your head.

And this is an extremely common situation to find yourself in. Code bases which are of middle- to large-sized, and which are inevitably touched by various hands of various skill levels, tends towards complexity.

SQL is the worst source of unmaintainable and difficult to refactor code. It’s difficult to unit test. Error messages are more often than not inscrutable. (I’m looking at you, Oracle.) There is no idiomatic way to break up complex SQL into functions or classes. There’s no type checking. You don’t have the equivalent of Ruby Gems, Python’s pip, or Swift’s CocoaPods. IDE support is limited to syntax highlighting Compare this with something like Eclipse or IntelliJ where you can just Ctrl-click on something and go to the definition. Want to rename a public method in a statically typed language? Pretty easy. Want to rename a column in SQL? Yeah, good luck. Not impossible, but you don’t have any guarantees that something won’t break until runtime.

So yeah. ORM has its place, and modern ones work very well at abstracting away SQL’s weaknesses.

scarface74 · on Sept 21, 2019

I seriously don’t think I have worked with a single developer in 20+ years that didn’t at least know simple sql and joins

goatlover · on Sept 21, 2019

> It turns out that SQL is actually much simpler than ORM's.

Are they simpler when it comes to matching the data to the application's data structures? One advantage of ORMs is that they encourage this setup from the start.

dragonwriter · on Sept 21, 2019

> Are they simpler when it comes to matching the data to the application's data structures?

That depends on the modelling approaches taken by the application developer and DB developer.

mumblemumble · on Sept 21, 2019

I think that this is true if you're writing your application in a way that requires object-relational mapping in the first place.

And that's only necessary when you're trying to manage your data in the application in an object-oriented way. And managing your data in an object-oriented way implies more than just the simple fact of defining classes to serve as data records. Those classes can be entirely equivalent to a struct in a procedural language or a record in a functional one. And I can't remember ever suffering from object/relational impedance mismatch when working in a procedural or functional language. Implying that the spot where you really start getting into trouble is when you trot out some distinctive feature of an object-oriented data model.

I submit that the original sin is treating instances of those data classes as if they are discrete entities that can serve as an application-side proxy for some other discrete entity that exists in the database, almost as if ODBC were just a more REST-flavored alternative to CORBA. Which is a thing that I've often been tempted to do in an object-oriented language, but never in a procedural or functional one.

Which isn't to say that I don't use anything to help with talking to databases in those other styles of language. It's just that I retain SQL as my query language (there are plenty of reasons to do it, none of which I'll bother to repeat here) and rely on a more Dapper-style library to handle unpacking the results into data structures. And I don't really consider those to be ORMs; they're just a special class of data mapping utility library.

So, in conclusion, I think that a more accurate stab would be "Any sufficiently hastily built, object-oriented, database-driven non-ORM program contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of a decent ORM."

moring · on Sept 21, 2019

Two counter-points:

(1) While treating a single SQL table as a distinct entity is a sin by your argument, I would argue that treating a set of tables as a distinct entity is not. I'm referring to what DDD calls an aggregate, e.g. a customer (address + email history + scoring + whatever tables), or an order (line items, shipping info, payment info, ...) Reading the comments here, I'm starting to wonder if some kind of holy grail or at least useful approach lies in there.

(2) The impedance mismatch has two ends, and one might as well argue that the problem is not mapping tabular data to objects, but mapping objects to tables. This might be an underrated benefit of what is usually called "NoSQL", dwarfed by the whole discussion around "schemaless". Unfortunately, I don't have any experience in that area, but it's something I have long planned on trying. Note that I'm not trying to say that NoSQL "solves the impedance mismatch" but rather that, in DDD's approach to solve persistence, NoSQL might actually be able to do what DDD wants (e.g. with respect to defining a "unit of consistency"), unlike tabular data.

mumblemumble · on Sept 21, 2019

A distinction that may seem like pedantry, but I think is actually the crux of the issue I've had with with ORM:

The problem is not in treating entities in the database as entities. It's in treating instances of classes in the application's memory space as local proxies for those entities.

You're on much firmer ground if you understand them as simple pieces of information derived from the state of those entities as of some point in time. Both in terms of ending up with a more internally consistent and robust approach to data (by virtue of removing some temptation to preserve an illusion that this data can be presumed to always be complete and up-to-date), and in terms of not artificially cutting yourself off from most of the power of the relational model.

You're right that this tension is somewhat resolved by just switching to using an object store of some sort. (NoSQL is far too diverse of a subject to treat as a single unit.) But it's a very particular sort of resolution, because it's a sort of least common denominator approach where you just drag the data store down to the level of the programming model. It strikes me as akin to resolving the difficulties in maintaining complex software in Perl by resolving to never let anyone write anything more powerful than a shell or CGI script rather than by looking for a language that will better support you for the long haul. It can certainly be a fine and reasonable choice. The problem comes in when people don't fully realize that that's the choice they're making.

ryangittins · on Sept 21, 2019

> You can mix and match perfectly fine.

Thank you!

This seems like one of those topics where people often feel the need to pick a side for some reason. I've often heard criticism to the effect of, "people only use ORMs because they don't know SQL. Learn SQL!" It's seemingly impossible to convince these people that ORMs are fantastic for reducing boilerplate code and they can coexist right next to raw SQL for problems gnarlier than "select all foo where bar equals baz."

revscat · on Sept 21, 2019

This.

Earlier in my career I made a point to deep dive into SQL. Long story short: I realized that there are a lot of very good reasons to limit the amount of raw SQL in your application that have nothing to do with familiarity.

SQL is just a bad language, and it’s unfortunate that we’re still stuck with it, basically unchanged, decades after its introduction.

For me, it’s almost as anachronistic as COBOL.

kimi · on Sept 21, 2019

YMMV, but for me it feels like magic every time I use it. I can get things done in declarative way that would take a while in any programming language I know (and I'm a Clojure guy, so I value simplicity and conciseness).

perl4ever · on Sept 21, 2019

The only issue I really have with SQL is the lack of control over how a query is executed can really be an impediment. It's like how people were talking in another thread about getting frustrated with garbage collection.

I spent a lot of time working in a department that wrote a lot of ad hoc Oracle SQL, for updates to a production system, and complex reports, and there was one guy that was ten times faster than everyone else, and also more likely to get his code correct, and so I paid attention to what he did.

He would break down a complex set of operations into simple individual queries generating temporary tables; he just wouldn't bother fighting with the optimizer, or trying to predict what it would do. So he not only wrote queries that ran quickly, but he wrote them quickly, and he got them correct quickly.

I would read Tom Kyte, exhorting people to use the full complexity of the language and the Oracle optimizer, but from my experience, it just was not the way to go. I wrote many page (or more) long queries that were things of beauty and then found that breaking them down into simple ones actually was usually much faster.

One fundamental thing that I don't think gurus understand, is that for the average grunt in a typical corporation, there is a separation of duties, such that you can't just go and change the things that a system administrator controls. So saying "your database is configured wrong" doesn't address normal life.

Although the pure relational model may be nice to think about, I find it really convenient to have a certain amount of sequential context, and PL/SQL always seemed to me to have a disgruntled relationship with SQL, so I've gradually tended towards Microsoft alternatives.

james_s_tayler · on Sept 20, 2019

The best ORM (outside of ActiveRecord) I've seen was a proprietary hand-rolled one that solved the impedance mismatch.

It had a canonical XML format that entities were defined in and code generation for data access layers, domain models, view models etc.

It actually worked better than I've seen the abuse I've seen developers put EF through. I found it nicer and simpler than times I've worked with Hibernate.

I'm not going to say it was perfect, but every time I think back on it it makes me want to revisit the idea of leveraging a bit of code generation or metaprogramming to be able to have a canonical definition of an entity transformed into concerns that deal with the given entity at different points in the application.

That part of it just really hit the sweet spot for me.

taffer · on Sept 20, 2019

Have you checked out MyBatis or JOOQ? They sound pretty much like what you're describing.

james_s_tayler · on Sept 21, 2019

MyBatis has _some_ similarities, notably the defining entities in xml part. But it diverges after that.

Think about it more like MyBatis meets Spring Data JPA. Define that entity in XML the run a code gen which gives you the CrudRepository class but also generates a controller that exposes an API with pretty good good ability to specify adhoc queries. Plus view models.

I think it worked because it both reasonably well designed and hyper opinionated.

tomtomtom777 · on Sept 21, 2019

What you are saying exactly highlights what the author is missing: That if your application has some logic, you will eventually have to map your database rows to your in memory typed structures.

It sucks. Sometimes it sucks less if you map your queries as well, sometimes it sucks less if you stick to SQL and only map your results, sometimes it sucks so much you're better off with a no-sql solution.

But when using a relational database, ORM isn't optional.

Avamander · on Sept 21, 2019

Um, PL/PGSQL and et al for stored procedures exist, OOP is optional.

mannykannot · on Sept 21, 2019

OOP is optional, but if your program has any concept of structured data, then, regardless of whether that structure can be expressed syntactically within the programming language, you will necessarily have a mapping between that and the database. It might be as simple as a 1:1 mapping between relations and ADTs, with collections of references being used within the program to represent the keyed relationships that exist within the database scheme.

revscat · on Sept 21, 2019

So? Stored procedures suck. There are good reasons they have never seen serious traction and are infrequently used to the point of being irrelevant.

_glsb · on Sept 21, 2019

Major international banks have entire payment systems implemented in PL/SQL. I’m talking 2-3kloc procedures and thousands of them.

Avamander · on Sept 21, 2019

That's what you think not having any knowledge of them. Many of the best and huge systems I've seen have been implemented in stored procedures.

nqzero · on Sept 20, 2019

Db4j is an async transactional database engine that uses java as the query language. It doesn't eliminate the O/R impedance mismatch entirely (it's not a graph or object database), but it does simplify it greatly since everything is java

https://github.com/db4j/db4j

I love the idea and it's been fun for me to make demos with, but I've gotten almost zero feedback on the API, and what I have gotten is "can you add a SQL frontend". And yet neither ORM nor SQL are loved

So the question (and I'm not suggesting that Db4j is the answer) is: "what's the API that would be most natural ?"

didibus · on Sept 21, 2019

Unless you don't have objects... then no need for an Object-relational mapping, not a well maintained one, and not an ad-hoc one. All you need is a DB connection pooling library, some helpers or libs to help you build dynamic SQL queries and that's all.

j88439h84 · on Sept 20, 2019

Very good point. I'll need to remember this.

SomeOldThrow · on Sept 21, 2019

What is a decent ORM under this definition? I don’t think I’ve ever seen one sufficient for business logic.

daenz · on Sept 20, 2019

ORMs lure you in with a false sense of neat abstraction. They have nice intuitive examples on their home pages. But then you use them in the real world, doing gnarly queries, and you realize that doing anything powerful and fast in the ORM requires its own completely separate abstractions, which are often difficult for the uninitiated to follow. It's also often a big pain to debug the raw SQL that gets compiled after the ORM does its magic.

The argument I've made before when going down the path of ORMs has been: do we forsee needing to use this model code on a different database engine? Outside of simple toy applications, or needing to support different engines with the same code, I agree that ORMs are more trouble than they're worth.

wvenable · on Sept 20, 2019

You don't use ORMs for gnarly queries -- that's not what they are for! They are for making manipulating the entities easier -- reading the data out of the database in a way that makes easy to modify.

You can (and should) use them for simple queries. You have a list of entities you want to query and filter, that's going to be fine. Joins are fine. But if you're doing some complex analysis, an ORM is the wrong tool. That doesn't mean it's a poor abstraction, or difficult to follow, or something to be avoided. It's not the right tool for that job. For the job it's designed for, it's going to save a lot of effort.

SQL is great for analysis -- it's pretty much what it's designed for. But for bringing data into your app and modifying it, SQL is cumbersome and verbose. If you're loading data into objects then you're just creating your own personal ORM anyway.

geophile · on Sept 20, 2019

ORMs make the simple things simple, and the complicated things impossible.

wvenable · on Sept 20, 2019

ORMs let you drop into SQL whenever you need, usually in a way that is fully compatible with the model, so that's entirely false.

philwelch · on Sept 21, 2019

Let's talk about how this works in reality.

In ActiveRecord, there's a method called find_by_sql. You can't call it directly; it's a class method on an ActiveRecord model. So you have to choose which of your ActiveRecord models should be used to instantiate the rows of your result set. (What if your result set doesn't really match any of your models? Pick one arbitrarily.) Your SQL has some extra columns. What happens to the data in those columns? They get monkey-patched onto the individual objects. (Which is stupidly expensive in Ruby.) Other than that, the individual objects are fine. They even have all your smart instance methods, which may or may not behave properly with all the ad-hoc monkey patching.

If you tried to short-circuit all of that nonsense, you tend to get arrays of hash tables. Which is, in my opinion, already a perfectly adequate interface!

james-mcelwain · on Sept 21, 2019

> What if your result set doesn't really match any of your models? Pick one arbitrarily.

I don't want to sound like the ORM defender, but I'm not sure I understand.

This sounds like a deficiency of Ruby and the ActiveRecord record model. In Java, for example, you'd just write a new POJO for your query, which isn't exactly difficult. There are no "smart methods" or whatever.

It is a valid criticism that this can proliferate data classes, but that depends on the application.

philwelch · on Sept 21, 2019

I was writing in terms of what I actually understand and have used—which doesn’t include any Java ORM. In fact if there are Java ORMs that consist solely of POJOs which are populated by raw SQL queries, I would gladly use them!

slicebo123 · on Sept 22, 2019

You can execute a "non-model" query like so:

results = ActiveRecord::Base.connection.execute(sql)

philwelch · on Sept 28, 2019

The API docs don’t make it clear whether that’s still possible so I didn’t mention it explicitly, but I have done that before and that’s what I was alluding to with the “array of hash tables” comment.

dkersten · on Sept 21, 2019

In my personal experience, ORM’s seem to encourage queries to get scattered throughout your logic (they’re just normal code and function calls, afterall... at least, they look like it) and encourage mixing application-side logic with query logic. The former makes it incredibly hard to remove if you need to reach for raw SQL and the latter leads to bad performance due to many application-database roundtrips snd not filtering enough before sending data to the application.

Yes, both of these things can be solved through disciplined modularisation of ORM logic, but in my personal experience across multiple companies, most developers simply aren’t that disciplined and treat ORM code as any other application code, instead of treating it as the remotely executed database code that it actually is.

In my experience, writing raw SQL (through https://www.hugsql.org/ in my case), you are instead encouraged to think of them as separate and carefully consider the boundaries, which helps keep the queries and application logic modular and allows for more carefully crafted queries that minimise roundtrips and data shuffling.

Again, this has been my experience, across a number of companies. Perhaps your experience differs, in which case, I’m jealous.

djrobstep · on Sept 21, 2019

That's absolutely not true.

With any of the ORMs I've used, as soon as you do something even slightly unorthodox like using a view, you're on your own.

mst · on Sept 21, 2019

Try http://p3rl.org/DBIx::Class in perl sometime.

If you're on your own using a view of all the simple shit, you're using an awful and pointless ORM.

wvenable · on Sept 21, 2019

All the ORMs I've used love views because views are just read-only tables. The dumbest ORM might not even notice the difference.

strokirk · on Sept 21, 2019

Which ones have you used?

pknopf · on Sept 21, 2019

Just use a Micro ORM from the get-go.

It's the perfect type-safe abstraction on top of raw SQL.

https://github.com/ServiceStack/ServiceStack.OrmLite

https://github.com/CollaboratingPlatypus/PetaPoco

Any errors you get are likely a result of the underlying database/provider (foreign key constraints, etc).

You should never write raw SQL (if possible). You don't need an ORM to achieve that.

daenz · on Sept 20, 2019

Running raw user SQL isn't a prerequisite of an ORM needed to make it an "ORM", it's a useful feature that most ORMs try to include because the authors recognize the many shortcomings. Also, by writing raw engine-specific SQL, you automatically invalidate one of ORMs biggest selling points which is being SQL-database agnostic.

And by "drop into", this typically means writing custom stitching code that stitches the SQL cursor results back into the models again. It's rarely straightforward.

wvenable · on Sept 20, 2019

> you automatically invalidate one of ORMs biggest selling points which is being SQL-database agnostic.

The biggest selling point is a massive reduction in boilerplate code. Database agnostism is a feature that almost nobody ever uses, so who cares! Your proposed alternative is engine-specific SQL so you lose either way. At least with an ORM, you'd lose significantly less. You'd just have to deal with the places you used SQL. Which, in my experience, is pretty small and pretty specific.

> this typically means writing custom stitching code that stitches the SQL cursor results back into the models again.

I often feel like people who complain about ORMs have either actually never used one or used a poor one. As long as my query matches the structure of my object(s) I don't need any stitching code. And if they didn't match, I wouldn't write stitching code because that would be waste of time.

If I'm writing a custom query, I'm probably not looking to integrate into the model anyway -- if it's used for reporting I'd just take the results as is. If I'm writing a query specifically to get a matching model object (to manipulate) I'm going to get the whole model and no stitching would be required.

timdev2 · on Sept 20, 2019

> you automatically invalidate one of ORMs biggest selling points which is being SQL-database agnostic.

I haven't heard anyone talk seriously about database-agnosticism since the very early 2000s. Maybe some commercial products still try (choose MS or Oracle!), but it's rare nowadays.

The primary selling point of an ORM is that it abstracts marshaling/un-marshaling rows to/from entities. Instantiating and persisting entities to relational storage.

> And by "drop into", this typically means writing custom stitching code that stitches the SQL cursor results back into the models again. It's rarely straightforward.

That's not typical in most uses I've seen. Far more typical are things like:

- Go straight to SQL for reporting, since that's what SQL does. Useful in reporting contexts, and also for list/filter UI screens.

- Use raw SQL to query a list of entity IDs for updating based on some complex criteria. Iterate over the identifiers and perform whatever logic you need to before letting the ORM handle all the persistence concerns.

falcolas · on Sept 20, 2019

> I haven't heard anyone talk seriously about database-agnosticism since the very early 2000s.

Do you use the same database engine for your unit and integration testing as you do production? I don't. I use sqlite for unit and local integration testing, and aurora-mysql for production.

As a side note, I quite literally can't use aurora-mysql for local unit and integration testing. It doesn't exist outside AWS.

ratww · on Sept 20, 2019

> I don't. I use sqlite for unit and local integration testing, and aurora-mysql for production.

That's a recipe for tests that don't catch edge cases.

sidlls · on Sept 20, 2019

Unit testing code that touches the database is not useful, and in fact it indicates there is likely a design flaw. Code that acquires from or changes data in the DB should be self contained.

An integration test that doesn't use the same DB as production is unsatisfactory.

philwelch · on Sept 20, 2019

Using a DB inside your unit tests is an antipattern and arguably a violation of the concept of a unit test in the first place.

Integration tests should run against a test environment, otherwise, what integration are you testing? I don't see the value in writing integration tests that test the integration between my code and a one-off integration test DB that exists solely for the purpose of integration testing.