Am I weird for preferring raw SQL over query abstractions like these ORMs expose? IMHO SQL is a perfectly fine language. I do however like letting a library do all the dirty work of turning SQL query output into objects. There's a good talk from this year's PyCon that shows SQLAlchemy's core library and how it gives a nice closer to raw SQL interface to a database: https://www.youtube.com/watch?v=0PSdzUxRYpA
It's an interesting question but I want to make a meta point.
I came here to read people's thoughts on the article but the top comment shifts the debate onto a wider topic.
This happens all the time on HN and is something I find slightly irritating if I'm actually interested in the topic itself. It's like that person at a dinner party who always changes the subject onto what THEY want to talk about.
I'm not blaming you - it's the fact that your post is the top one and the one most commented on that derails the original topic - which is a group decision.
It's not democracy, the top comment isn't the "only" comment as if the will of the majority chose it and that's all that exists in this thread, effectively suppressing minority influence. You have options to view other comments or start your own discussion if you please.
I think your point of contention is that other people, reading this very same article, approve of a comment that did the exact same thing you did: made a point about a "meta"-topic by forking the conversation into some other concept that was assumed when both of you were compelled to comment. Now we have three (your parent, you, and me) off-topic forks of a conversation that is superfluous to the so-called ubiquitous topic of the thread.
I see where you are coming from, but I find these "meta"-topics at times more interesting than the topic itself, hence why I'm commenting.
Interests are interests, your response is just as much justified IMO as your parent commenter.
So find, upvote, and participate in — or, if necessary, create — a thread that offers the kind of discussion you're hoping to see. Coming into a thread that's not the discussion you wanted, and being all, "But guys! We should be talking about $other_thing!" is an even more annoying dinner party guest, IMO.
In my defence - the $other_thing is the actual article though.
It's also that there are some predictable topics. Every post about a new Google product has someone bringing up the Reader shutdown and every post about ORMs has someone talking about how much they prefer raw SQL...
Unfortunately, a problem is that ORMs are goofy: taking a decent model (relational) and mutilating it into a mis-designed one (mainstream OOP). Because of the split between the programming vs. database worlds, where the programmers have the worse solution.
No, you are not weird. Yours is exactly my approach: Hand written queries with help from libraries for writing and, especially for retrieving data from the query back into the OO world.
My interpretation is that the relational and OO models are incompatible. Not 100% incompatible, but with enough dissimilarity to make writing a complete and efficient ORM a near impossible task.
Every ORM can easily do the simplest of mappings. Classes to tables, rows to instances is the very basic stuff. It is when you have complex data structures that the task of the ORM is close to impossible. Either you have a DSL for describing this mapping, or you fall back into the general language the rest of the program is written in.
I see little value in a DSL for describing relational to OO mappings, so I write the mapping in the general programming language already in use, hence I do not find any added value in ORMs.
This is the basis for iBATIS in the Java world. You provide the custom SQL and iBATIS knows how to wrap the results up into objects (as long as you provide the mapping).
So you're happy to generate the SQL via some language-specific API, and you're happy to have the results turned into objects...
What's the problem with an ORM again?
Half the point of SQLA is that the ORM is built on the core SQL generation. It's the least magical and most RDBMS-friendly ORM I've ever seen. It's designed for people who already grok SQL, not people who'd like to forget that it exists. You can even jam hand-written SQL into the ORM if you so please.
How do you do that without massively violating DRY? I've done tiny little personal projects where I use raw SQL, then eventually I find myself storing parts of queries into variables and concatenating them later, and it doesn't take long to realize I should just use a decent ORM.
It depends on the project. Quite often there simply isn't that much repetition.
On the flipside, how do you feel about slower performance? About diagnosing problems through multiple abstraction layers? Or about having different abstraction layers for different languages?
Stored procedures are great for solving some of that pain. For anything really serious you might want to go all the way and only use stored procedures so there's no ability/risk of your app code ever making raw SQL calls.
Stored procedures are a fucking pain. Now you've managed to split your logic in two completely different locations, and you get to enjoy rolling out new versions of the procedures in their own migration. Not to mention you're tied to your database.
On the other hand, ORMs can be a huge source of inefficiency if you don't look at the generated SQL and use lazy collection thoughtlessly, though they avoid a lot of boilerplate when saving a complex object graph.
A happy(?) medium is probably something with a powerful data retrieval API, which makes for composable queries (SQL query strings compose very badly).
In some environments, having the logic in the database is actually the better choice. When you are dealing with data that is in the million plus rows, there is more to lose from the separation of logic than just letting the database do what its good at while adding your logic.
All these discussions of which is better (ORM or straight SQL) should be put in context of the type of application, the amount of data and the complexity of the schema.
I have worked with systems which constrain all logic to the application server and when the data set is small it works fine, when its not, its a nightmare. Why should you bring down a million rows down to your application layer just so you can update a couple of columns. Doing this in the database as a set operation becomes trivial and fast.
No doubt you occasionally need to push logic down to the database, much in the same way that you sometimes need to uglify your code to make it fast enough, but in my experience, it's a good way to get a painful-to-manage codebase full of fantastic ideas such as "let's build this complex XML structure by concatenating strings".
Can you explain? I have no problem and have used or written my own SQL composition libraries pretty easily. It is code-composition, no doubt, but the BNF for SQL is very easy to grok even given the variations in platform.
Raw SQL is terrible when you need something a bit complicated. Want to do a complex search? Unfortunately, your coworker forgot an " AND " in a little-used function, which is going to blow up on you at runtime. Or maybe you didn't count the placeholders right and you inserted the wrong numbers values. Can you make that work for you with a library? Sure. But on its own, it's just bad.
On the other hand, SQLAlchemy's query API is quite nice, and composes just fine.
I personally share your tendencies. But what I'd say is that SQL Alchemy provides a great substrate API for doing exactly what you are talking about.
In those cases where you need to compose SQL, it can save you a ton of time with a SQL composition API (rather than building your own strings).
That is the basis of how SQL Alchemy builds up its ORM functionality. So if you are working on a project where you can save some time using the ORM, but prefer to work closer to SQL, you've got a great mix.
I prefer SQL as well. My theory is that it's because I started as a developer early enough that I had no choice but to learn SQL very well, and since I did, there's no motivation for me to add extra layers of abstraction/indirection to my code and deal with all the costs.
Personally, it depends. For doing anything somewhat complicated I usually prefer raw SQL - more compact, generally far more flexible, and I can trivially do dangerous crap like string interpolation. Translating the end result to objects is usually pretty simple. With a few exceptions, this has been for specific performance boosting hacks in bottlenecks, and site-wide statistics displays.
For anything simple, I vastly prefer ORMs. Joins and single selects are incredibly verbose and error-prone compared to "user.books" and "user.address" or the equivalent, and the added layer of abstraction means it's easier to modify the table without modifying code.
There's a tipping point. Eventually, you're likely to end up with half of a poorly-specified, bug-ridden implementation of an ORM if you don't use one at the start.
However, sometimes it's much simpler to just write custom SQL.
I'd say it's exactly the other way around: Eventually you end up being constrained by whatever ORM you're using, and you'll wish you had just used SQL from the beginning.
I will say, though, that SQLAlchemy is very nice for an ORM.
With the addition of custom lookups and transforms in Django 1.7* I'd be curious to know where it still falls behind other ORMs. I've only needed a couple of bits of raw SQL across a dozen or more Django projects.
I've been working with Django for 5 years. In my experience, Django's ORM is pretty good for the common cases (specially when your query maps one-to-one to a single model), but for anything that you need a bit more flexibility (dealing with scalars, GROUP BY, a custom JOIN) you're on your own with raw SQL, which gets unmaintainable quickly since you can't chain it with QuerySet and suddenly you're not compatible with the rest of the codebase.
SQLAlchemy is more interesting in that it doesn't hide SQL away, just provides an API for it. Underneath everything you're still dealing with strings (you can just print most objects, like queries, columns, expressions, SQL functions, etc), so you're less likely to be suddenly incompatible with the rest of the API when doing something that deviates a little more from the common cases.
I had to use ActiveRecord last week after a year straight of using SQLalchemy and it was downright painful. doing anything beyond simple CRUD is near-impossible without writing raw SQL. i used to like ActiveRecord more when I didn't understand relational databases, but now I don't understand how anyone who's ever used SQL directly in their life could tolerate AR.
maybe this has changed, but arel's documentation last time i used it was about as good as the rails documentation, so it was wildly incomplete/inaccurate.
for example, last week I wanted to a bulk INSERT. maybe arel can do this (though I highly suspect it cannot since this isn't a part of relational algebra at all), but that's kind of worthless if i can't find any evidence of how to do it without reading the arel source.
I'd love to see a comparison with non-Python ORMs. I've used Sequel for Ruby and it's wonderful but, even though I've used SQLAlchemy as well, I don't have a sense of pros and cons of their respective general approaches.
I use SQLAlchemy for just about everything won't suffer too much from the expected ORM overhead. I've found it to be powerful, intuitive, and well suited to larger projects.
Are there many other Python developers working directly with libpqxx though? I've recently started exploring the idea of moving some of my relatively stable, postgres-specific code into a dedicated library and would be grateful for tips or words of caution. My hope is that this approach will wind up being useful in cases where I'm coding against a large body of pre-existing stored procedures.
I've used Doctrine 2 with PHP, and am yet to find another ORM like it in any language. It stands out as unique (to me) in that you just write plain old classes which don't need to know about the ORM. You write a separate schema which Doctrine reads and then uses proxy objects for the mapping.
Am I the only person that finds this preferable to coupling your objects with the ORM?
It's been a while, but I enjoyed using Castor before other approaches became so widely adopted in the Java world. Does it seem conceptually similar at least?
Interesting, thanks. I had heard of Hibernate before but never actually looked into it. I can definitely see the similarities - especially between Doctrine's DQL and Hibernate's HQL.
some people love this idea, and for many years we've had a product called SQLSoup (https://sqlsoup.readthedocs.org/en/latest/) which does exactly this on top of the SQLAlchemy ORM, and the docs for pybean look quite similar.
However I don't have the resources to maintain this very old project right now (it's only about 500 lines, anyone can pick up the source if they cared).
A more SQLA-centric version of this idea is recently released as the automap extension (http://docs.sqlalchemy.org/en/latest/orm/extensions/automap....) which includes the "map everything on the fly" step plus relationship support, and you then use traditional Query patterns with it. Reaction to it has been mixed, depending on where the user is coming from.
Slick (http://slick.typesafe.com/) seems to support this as well in a type-safe manner, but personally I'm not too keen on having the compilation depend on having a stateful database available.
No mention of Active Record vs Entity Mapping ORMs? Are there not many Entity Mappers in existence? The only popular one I know of is Symfony's Doctrine (in PHP land)