Hacker News new | past | comments | ask | show | jobs | submit login
ORMs don’t kill databases. Developers do. (arrangeactassert.com)
60 points by jagreehal on Aug 30, 2011 | hide | past | favorite | 51 comments



Summary: ORMs are a leaky abstraction.

Thus, it is true that developers are responsible for knowing what is going on under the hood and either contorting how they use the ORM to get what they want or skipping it when necessary.

And it is also true that ORMs are to blame, they do not work as advertised, and often impose extra conceptual overhead because you end up composing SQL in your head and then trying to figure out how to get the ORM to generate the SQL, instead of thinking in the ORM's abstraction.


That may be true, but that's what programming is for the most part. Pretend like it will just work and then optimize later when it is proven to be too slow through profiling. In this case, you need performance data from the database. Use that data to improve anything that needs it.


Does it work 99% of the time though?


"a bad workman always blames his tools"

ORM's are just a tool, some better than others. Developers need to know SQL, relational algebra and normalization. There is no excuse for not knowing proper databasing, ignoring the SQL generated by your ORM and blindly following along. A good ORM (like SQL Alchemy) will grow with you from the simplest of cases to the complex.

Personally, I have never developed an application with an ORM and ignored the SQL generated. Most of the time the SQL generated by the ORM is acceptable but there are instances where the naive approach with an ORM generated piggish SQL and needed to be reworked.


I think your last point is right. It's not just about knowing your tools, it's about knowing how your tools fit into the system that you've built.


You always hear this from somebody who is new to ORMs, spends 3 months working on a project, and then is amazed because he doesn't like one thing or another in the file.

It'll be funny when he discovers that with just a few other lines of code he can not only fix that issue, but also completely change how data is stored in the database, use 1st and 2nd level caching for the objects, and be able to easily refactor all that stuff when the business logic changes.

By the way, I seriously doubt that query is the cause of performance problems in his system.


Spot on. ORMs can generate some crazy-bad SQL if you're not careful with them. A good ORM will generate great SQL in the common case and allow the developer to execute custom SQL without fuss when necessary. If your ORM makes it a pain to pump in raw SQL in place of generated SQL, find a new ORM.


"beautiful" sql is subjective. What does it mean? SQL that looks pretty to the eye (with proper indentation)? Or is it SQL that can run on dozens of different database servers with no modifications? Or is it SQL that can be automatically generated so you don't have to spend any time thinking about it but that runs 5% slower than something you spent 1 hour on?

I would probably go with the two last options.


The typical use of ORMs that kills the database is where one uses record-based operations to traverse a set: resulting in one query per record.

The blogger's point is quite good: a developer probably doesn't know what SQL is being generated, it may be opaque how small variants in query source can result in radically different SQL based on the translation process. Hence, a profiler should be used to find problematic queries. This is solid advice.

What's not mentioned though is that sometimes what appears to be very ugly SQL actually out-performs pretty, parsimonious, hand-crafted SQL. This, assuming the developer could actually craft correct SQL equivalent.


ORMs aren't the problem here, databases not designed for use by programmers is the problem.

ORMs exist because straight SQL isn't a good interface for programmers, and ORMs are bad/a leaky abstraction because the problem is really messy.

Instead, let's just build a relational database that 1) has an API that is useful to programmers i.e. not SQL 2) supports explicitly specifying a plan, not providing hints or clues, but using operations that look map, filter, reduce, hash-join, etc.


Wrong. Relational databases are one of the fundamental components of most large scale applications and SQL is usually the interface. If you are going to be a journeyman or master programmer in this area you must understand it. If you dont, you are just an apprentice. Just like you must understand networking, concurrency, algorithms, data structures, and hardware. If all you understand is Java an its brand of OO then you are a trade school mechanic.

I know this is snarky as hell and I apologize for being offensive. But I am not an MIT CS graduate. I am completely self taught. SQL has been a necessary skill in every job I've had in the last 20 years. So when users were complaining were complaining that a data download to csv files was limited to 200 records (due to a dozen joined tables, lazy loading, table locking, using velocity to generate csv files, etc), I threw out Hibernate and velocity and used SQL, JDBC, and direct HTTP output to generate 50 times the output in one tenth the time.


Just because you have an example of a probably poorly designed, over-engineered system that could be completely replaced with some hand written SQL code, doesn't mean that is the best solution for everything. Come back when you have an example of a system running on top of 500 tables, doing read-writes in a big iron server, communicating with dozens of other servers, that uses distributed caching to improve performance, and that you can explain how you can replace an ORM with SQL.


My point was not that ORMs are bad, but that SQL is required knowledge to communicate with a databases, including getting around the problems an ORM causes. It is hard for me to believe that someone who doesnt understand SQL can be a competent ORM programmer.

The system I mentioned was well designed with hundreds of tables, multiple replicated database servers, dozens of web servers, caching and all that. The problem was a specific case where knowledge of SQL was able to produce a 50x performance improvement.

I think SQL essential to learn because it bends your mind. You have to think in terms of sets instead of sequential processes. Just like functional programming and BNF are mind stretching.


cool, we agree on that.


I'm doing something similar.

We migrating billing system with Hibernated business logic into pure SQL.


Sigh. Maybe I wasn't clear, but two clues in what I said, "straight SQL isn't a good interface for programmers", and "let's build a relational database"

Every time I bring this up, someone misunderstands me. I'll explain it again: I want a real RDBMS, I want ACID, I want transactions, constraints, triggers, foreign keys, etc, I just don't want the brain-dead SQL the language to interface with it. Preferably something with lispy syntax and a manual override on the planner (not hints, but full-blown "I'm in control now")

Here's what I wrote about it a few years ago: http://arohner.blogspot.com/2009/04/db-rant.html


Yes, please.

While we are at it, can we get rid of the set stuff? I want to think in terms of indexes. Or pipes. Or something.


"ORMs aren't the problem here, databases not designed for use by programmers is the problem"

I can paraphrase it as:

ORMs aren't the problem here, relational databases are problem.

I don't think that letter R is to blame. Letter O is culprit.

Why? Because OOP is not suited to work with data.

And what's wrong with SQL? It's a good enough abstraction from actual data.

I wonder: who needs another (letter O) layer over already abstracted layer?

"2) supports explicitly specifying a plan, not providing hints or clues, but using operations that look map, filter, reduce, hash-join, etc."

You have it already: SELECT type, SUM( -- reduce object_id * 2 -- map )

FROM sys.objects WHERE -- filter type = 'U' GROUP BY type

Hash-joins - yes, only hints, and not in all DBs


> I don't think that letter R is to blame. Letter O is culprit.

That's what the disagreement boils down to, I suppose: if we agree that there's an object-relational mismatch (http://en.wikipedia.org/wiki/Object-relational_impedance_mis...), then is the solution to drop the 'O', to drop the 'R', or to find a way to bridge them?


My not humble opinion: so called "Object-relational impedance" is inevitable.

Why?

Because on one side we have Relational Algebra - simple clear theory (I can explain it to 8 year children).

And on other side we have something amorphous, defined in nonexplicit terms. I mean OOP.

Objects are bad for data. So, just drop "O" completely.


Or use an object database like db4o if you want to eliminate the mismatch :)


Sorry.

I can't edit my answer, just look at this:

SELECT type, SUM( -- reduce object_id * 2 -- map )

FROM sys.objects WHERE -- filter type = 'U' GROUP BY type


"Error establishing database connection"

ORMs don't kill databases. Hacker News does.


Sad but true! Bluehost should have sorted the issue...


"Error establishing a database connection"

Heh.

Anyway, cached text from Google:

----

Following a recent post I wrote Phillip Haydon made an excellent argument about not using LINQ in ORMs like nHibernate and Entity Framework without consideration to what they are doing.

Something I definitely agree with.

    Just as guns don’t kill people, ORMs don’t kill databases. Developers do.
I should have picked up on this because it’s happened to me on a project I was working on. Death by ORM During development everything was performing well. So well that even the most sceptical developers decided to use ORMs instead of stored procedures.

The application was responsive and users were happy.

However in production there has no happy ending. The application went from running like sh!t off a proverbial shovel to just sh!t.

It was a nightmare and the finger pointing began.

The problem was we assumed (always a bad thing) the queries generated by the ORM would be as good as if we had written them ourselves.

So when we looked at what was being generated it came as a bit of a shock to see what was going on.

But while it was easy to use the ORM as a scapegoat the reality was developers were at fault.

Here’s a simple example of how even the smallest of changes can make a difference in the SQL ORMs generate.

In the code below all we want nHibernate to do is select the of the fruit that is NOT also a color.

    [Test]
    public void Tale_Of_Two_Queries()
    {
        var fruitDatabase = new FruitDatabase();
        using (ISession session = fruitDatabase.Session)
        {
            using (ITransaction txn = session.BeginTransaction())
            {

                var apple = new Fruit() { Name = "Apple", IsAlsoAColor = false};
                var orange = new Fruit() { Name = "Orange", IsAlsoAColor = true };

                session.Save(apple);
                session.Save(orange);
                txn.Commit();
            }

            var firstQuery = session.Query<Fruit>()
            .Where(f => !f.IsAlsoAColor)
            .Select(f => f.Name)
            .ToList();        

            var secondQuery = session.Query<Fruit>()
                    .Where(f => f.IsAlsoAColor == false)
                    .Select(f => f.Name)
                    .ToList();
        }
    }
For the query that uses !f.IsAlsoAColor the SQL query looks like this

    select fruit0_.Name as col_0_0_
    from   Fruits fruit0_
    where  not (fruit0_.IsAlsoAColor = 1)
For the query that uses f.IsAlsoAColor == false the sql generated looks like this

    select fruit0_.Name as col_0_0_
    from   Fruits fruit0_
    where  case
             when fruit0_.IsAlsoAColor = 1 then 1
             else 0
           end = case
                   when 0 /* @p0 */ = 1 then 1
                   else 0
                 end
While this is a trivial example it highlights the fact you don’t know what the generated SQL will be, let alone the execution plan.

And if your query returns the results you want are you really going to spend time finding out what’s going on ‘under the hood’? How comes nobody talks about this?

I can’t ever recall a presenter warning about this during any ORM talk I’ve been to. The focus always on what it can do and how it compares to other ORMs rather than best practices. It’s not just a .Net problem either

After talking to Ruby on Rails developers this is a problem for them too, because Active Record doesn’t work 100% of the time. When it goes wrong they have to implement workarounds to execute queries without using active record. So how can I make sure it doesn’t happen to me?

The first thing is to know what issues to look out for. The nHibernate profiler alerts page is an excellent resource for this.

With this information you can use profiling tools to see what queries your ORM is generating and how your application is performing.

My last tip is to always use the best data access method for what you’re doing. It’s easy to become complacent or aim to everything using an ORM, but if it makes more sense to use a stored procedure then that’s what you should do.

It would be good to hear what your experiences are with issues like this and to share any tips/tools you have used to avoid ORM performance issues.

In the meantime I’ll leave you with this excellent DBA vs. Developer (Star Wars Style) video


What seems to be suspicious SQL (the CASE/WHEN/ELSE) is probably not the problem. This is classic boilerplate SQL "cast" code to make a boolean value for databases that don't support booleans. Distracting, perhaps, but not a smoking gun. Likely the problem with something like this is that there's a full-table scan on the Fruits table, and that the function doing the scanning is called frequently without caching.


> What seems to be suspicious SQL (the CASE/WHEN/ELSE) is probably not the problem.

It certainly could be the problem. Like you say, a CASE becomes a computed field and requires a table scan. A simple check for IsAlsoAFruit can be done with an index scan, assuming it has that index. This can be a huge difference in data scanning and memory load. Of course, there are also plenty of times when SQL will spurn a boolean index because it's not selective enough.


"Likely the problem with something like this is that there's a full-table scan on the Fruits table"

What do you think the DBMS should do for a table with two tuples?

Regardless, it should be easier for the DBMS to use alternate plans if simple expressions aren't hidden behind a bunch of useless CASE statements.


One Rails project I had got sidelined for a good 6 hours because ActiveRecord requires IDs on everything. The documentation says that you can skip IDs, but when you try to save it crashes every time.

Go figure whose fault that one is.


I'm interested - where did you learn that ActiveRecord does not require IDs?

The only case where I can think of where it doesn't require IDs is when building a join table (for a many-to-many relationship).

In every other case, you need to specify a primary key field (although it does not need to be an auto-incrementing integer called "id") and I'm sure it says as much in the docs.

EDIT: You can do a select and omit the ID field in the resultset, but you cannot save the results back again - simply because it does not know where to write it back. Again, I'm pretty sure I remember reading that in the docs, but I may have imagined it.


http://api.rubyonrails.org/classes/ActiveRecord/ConnectionAd...

    create_table(:categories_suppliers, :id => false) do |t|
      t.column :category_id, :integer
      t.column :supplier_id, :integer
    end
I did this and indexed the table by date, and it works for everything except saving a record. Like I said, an ID/primary key is required though that section of the documentation implies that an ID is optional.


Hmm - yeah the docs are sort of unclear, if you're not familiar with the convention.

:categories_suppliers implies that the table is a join table (a many-to-many join between categories and suppliers) - which is why it has no ID and why it only has the two foreign key fields with no other data.


Yet the assumption that all models require an ID is a fallacy. I was generating reports unique to the day, therefore a simple Date field was adequate as an index/ID. I don't know about other ORMs but when working directly with the SQL it'd be no issue.

I hold out that it's a problem in the design of ActiveRecord to assume that there will always be a primary key. That said, if I can just tell the model "identify yourself by this key" WITHOUT setting it as the primary key I would be happy.



Are you saying that because I didn't set the date field as my primary key, it crashed?


If you are contemplating using or not using an ORM, chances are you are just starting to use a data store. At this point, perhaps you should contemplate whether to use a SQL based database or not!

You can always use a NoSQL database, such as:

* Riak * MongoDB * Redis

or any of the other ones. And if you go with them, you won't need an ORM. If you go with a database that needs SQL, may as well explicitly specify the model methods and their SQL, yeah.


Using ORMs gives developers (and other stakeholders) a way to get through the initial conceptual design and development faster. It helps getting to 'functional' much faster and it makes feedback loops very effective.

However, before pushing to production, you should always have some idea of typical user scenarios. What actions are they taking? With how much data? How often? In what sequence? Then, you load test based on that information. You identify that 20% of the ORM isn't going to work, and you replace it with stored procedures, or in-memory tables, or something.

For someone who grew up pre-ORM with on-staff dba-types who had to redesign their schemas on every little feature change, and thus recreate entire data layers, ORMS are FREAKING AWESOME.

But, you still need someone who knows what they're doing.


I agreed with you until the point that you said to replace things with stored procedures and in-memory tables.

Having the two different contexts for logic, code and SPs makes things a thousand times harder to debug and maintain, making the cost of anything skyrocket.

ORMs are indeed awesome, but the solution for problems like this is not to take ORM out of the way, is to work with it to make it work better.


Why is that more difficult? It seems like it worked fine for two teams I worked with. What are other common or better ways to get around this?


the big problem is that it makes refactoring and changing logic much harder, as you now two different places to maintain business logic.

In my experience, most problems that DBAs would like to use SPs for can be easily solved by refactoring/merging SQL calls and proper caching of objects during a request or session, two things that an ORM really help you with.


I think that object oriented languages (e.g. java) is not well suited for business logic.

Putting all logic in DB also provides natural separation of concerns. UI in one place, business logic in another.


What? So you think Java is not expressive enough for business logic, and you think it's better to move that logic to PLSQL? Seriously?

Java may not be the best language for some things, and that's why you use DSLs, but moving that to the place you store the data is probably the worst thing you can do 99% of the time.


Yes.

Business logic as far as possible must be expressed in declarative way.

Java have very poor facilities for declarative programming.

SQL (not PL/SQL) on other side - is pure declarative language. It's already powerful DSL if you master it.

And yes, I agree with you, that writing logic in PL/SQL (as in imperative language) is equally terrible as writing it in Java.


Omg.. this is getting worse.

SQL is a DSL to deal with sets of data. Unless the application you're writing is a database management app, you're pretty much doing it really wrong, I'm sorry to tell you that.


>>SQL is a DSL to deal with sets of data

What's wrong with this?

Can you imagine process of billing campaign?

Than try it in Hibernated database and in raw SQL.

Pure SQL will be thousands times faster.

ORMs are good only for smth like Django Admin


"Just as guns don’t kill people, ORMs don’t kill databases. Developers do."

Guns do not kill people but, they sure do facilitate killing, don't they?


If you're building a Rails app, just profile your queries (perhaps with New Relic) before pushing to production.


I don't think that NewRelic would help in this situation, as they only display SQL queries and the corresponding execution plan for slow queries in slow transactions. While this is certainly helpful, the blog post is pointing out the need for developers to know what their ORM is doing from the start, not just when a performance problem begins to manifest.

Put another way, if you have good visibility into your system early on, and understand what's going on under the hood, you can avoid performance problems in the future.


I agree with what you said about developers understanding what does on under the curtains, but regarding the NewRelic statement, I couldn't disagree more with this.

Spending time early on in the project on things that don't matter and that don't really impact the system, like the example given in the post, is exactly the kind of thing that ends up creating projects that are expensive to maintain and just bad.

Focusing on solving large performance bottlenecks, that can easily improve things 1000 times more than fixing that simple example is a much better use of your time, and is what newRelic shows you.


This. At almost every level of web development, we're working with code that generates some other kind of code. Ruby -> C, Erb/Haml/whatever template => HTML, maybe Coffeescript -> JavaScript, and finally ActiveRecord/any ORM -> SQL strings.

I write Ruby all the time, I don't know a lick of C. I've got a solid understanding of SQL and relational data modeling, but why should I check the generated SQL of _every_ single query?

Furthermore, if you don't use an ORM, your model's methods will contain a lot of similar functionality. Hmm, maybe I should write a module or base class so I don't repeat myself. Eventually you will have written an ORM.


ORMs dont kill databases. SQL does !




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: