Everything you wanted to know about SQL injection (but were afraid to ask)

mrweasel · on July 30, 2013

Arguably, I worked with people who didn't know of the concept of "prepared statements", but these are the same kind of people that won't read this article. I think prepared statements should have been highlighted more.

ORMs makes things alot simpler, in some cases, and get in your way more often than not. I don't do much .Net anymore, mostly Django stuff, but in both cases the ORMs just forces me to learn a new query syntax. In the end prepared statements and SQL would have been quicker, more flexible... and perhaps safer. I get the concept of the ORM, but I don't really plan to switch database anytime soon and I have needed for the flexibility of having "no ORM". ( Could that be a thing, like No-SQL, No-ORM? )

The funniest SQL injection attacks I've seen have been against our search pages. Attackers assume that search is done as dynamic MySQL queries, that fact that you are actually using Sphinx or Solr seems to escape most wanna be hackers.

zalew · on July 30, 2013

> I get the concept of the ORM, but I don't really plan to switch database anytime soon

not sure if you get the concept of ORM, because switching databases is not what ORMs are primarily designed for, it's more of a common feature / side effect that may get invalidated when you use unsupported dbms features.

> and I have needed for the flexibility of having "no ORM".

http://docs.sqlalchemy.org/en/latest/core/index.html

http://docs.sqlalchemy.org/en/rel_0_8/core/tutorial.html

or a clojure example http://sqlkorma.com/ you can probably find something similar for python, there are a few lightweight sql generation libs.

> The funniest SQL injection attacks I've seen have been against our search pages. Attackers assume that search is done as dynamic MySQL queries, that fact that you are actually using Sphinx or Solr seems to escape most wanna be hackers.

bots. they see a form, they submit crap to it.

dustingetz · on July 31, 2013

anti-ORM is not exactly a new sentiment - ORMs have been a hot & unresolved debate for two decades - but you need to be very careful about what you read on the internet, a lot of people express reasonably written opinions with an authoritative tone (like me) but are just wrong (hopefully not me haha). Find someone smart that you trust, like Rich Hickey or Greg Young, and try to understand what they say.

twistedpair · on July 31, 2013

Odd that few articles champion a limited database user. In my apps there is a crippled read only user, an update user with limited UPDATE/INSERT privileges, and finally an admin user that is in no way accessible from the app.

There is no reason for your standard app user to have access to the core database object tables to do DDL lookups or mess around with the internals. Further, most queries should only run through the RO user. Finally, no user for your app should have DROP or SUPER privileges. You also should disable multi-query requests out the gate.

Of course the dev's will need to remember to use the proper set of creds from their favorite SQL IDE, and there will be a few hiccups for 10min until you've got the proper creds set for the app user, but then you can limit the worries from an attack.

welder · on July 30, 2013

I take the exact opposite view. I use an ORM where you think in SQL (SQLAlchemy) and can build sql queries using functions named the same as the SQL.

If you use an ORM then SQLi should be impossible, unless there's a bug in your ORM.

crazytony · on July 30, 2013

as you point out you can build the sql query. I've not seen any ORM that validates the SQL you pass it to build the query. This is how SQLi happens in the ORM tier.

objects = orm.rawQueryForObjects("select * from people where name = '" + name + "')

welder · on July 31, 2013

You build the sql query using chained functions, so it's safe:

objects = People.query.filter_by(name=name).all()

When I said 'build the sql query' I meant using SQLAlchemy core, which is also safe:

objects = select([People]).where(People.name == name)

crazytony · on July 31, 2013

Even sqlalchemy has the mechanism I pointed out. I've never seen an ORM not have it because it becomes important if you're putting an ORM on top of a previously designed database or optimising queries.

session.query(Object).from_statement("SELECT * ...")

http://docs.sqlalchemy.org/en/rel_0_8/orm/query.html#sqlalch...

X-Istence · on July 31, 2013

Sure, and shooting yourself in the foot is also possible by pointing the gun down at the floor ... doesn't mean it is a good idea.

Generally the same statement can be built using internal SQLAlchemy...

But if you want to do your own, you can in sqlalchemy while still being as safe:

http://docs.sqlalchemy.org/en/rel_0_5/sqlexpression.html#usi...

roc · on July 31, 2013

> "Sure, and shooting yourself in the foot is also possible by pointing the gun down at the floor ... doesn't mean it is a good idea."

But it's possible and people do it. Which is, I believe the point was, the counterpoint to "SQLi should be impossible".

Sure, One can avoid shooting themselves in the foot with an ORM. But that's also true in SQL.

jcampbell1 · on July 31, 2013

Prepared statements don't handle all cases, e.g. "ORDER BY :?" doesn't work.

yogo · on July 31, 2013

I think you can add your own kind of preparation/binding on top of the db engine's binding by introducing a little syntax for those cases. Basically what I'm saying is to properly sanitize what gets inserted then perform a string replace on the query string. It comes in handy for other clauses that normal binding does not cover.

An example would be you would have your binding take care of something like :foo: as a sort of preprocessing before the db takes care of the others like :foo

chiph · on July 31, 2013

Another example (SQL Server specific, perhaps) is parameterizing the COLLATE option. You just can't do it, so you have to string-concat it, which means that your query cache has n versions of each query, where n is the number of locales your app supports. Good thing RAM is cheap.

AaronBBrown · on July 31, 2013

And of course, the highly relevant classic XKCD: http://xkcd.com/327/

sehrope · on July 30, 2013

Both raw SQL and using an ORM have their place. The latter is definitely less prone to unintended SQL injections but it's still possible. The reverse is true too. Raw SQL can be quite secure if you're not an idiot about it (golden rule: never build command strings from user input). If you use prepared statements across the board you don't even have to bother with sanitizing user input[1].

The main reason to use an ORM is for programmer efficiency. Sure I can write a bunch of CRUD operations but why bother if I can have the ORM do it for me? Since CRUD operations are either inserting a new row or accessing/updating by a primary key it'll be indexed as well so no perf issues[2]. The additional work by your app code for the ORM library is meaningless compared to the DB round trip anyway. My favorite part? Adding a new field goes straight to the model and nowhere else[3].

The big place for raw SQL is as the secret sauce on the meat and potatoes of your apps. Once you've got a real data set and you want to combine, slice, dice, you're not going to do that through an ORM. You want those queries to be as performant as possible. If you've structured your tables properly (proper foreign keys, normalization, etc) then the writing custom SQL will also be much more straightforward then trying to kludge together ORM commands to do what you want. On top of that, the work will happen on the database where it can filter it prior to your app processing it.

For our app[4] most of the CRUD pages are handled by an ORM but there's quite a bit of custom SQL too. One example is for security authorizations. Validating security authorizations (can user X access DB y) is a hierarchical check. The logic is all done in a Postgres stored proc (well technically a function). Doing it via an ORM would inefficient, both in programmer time and computer runtime.

[1]: You probably should though. It's generally a good idea to have some kind of white listing for what is an acceptable value for a field. Either way though you need to make sure you escape them when outputing HTML for webapps to prevent XSS. That combined with prepared statements when interacting with user inputs is the only right way to do things.

[2]: For basic CRUD operations and even simple one-to-many lists. Beyond that things can and do get hairy but those aren't the majority of cases. The majority of app code is vanilla id-based select, inserts, updates, and deletes.

[3]: http://en.wikipedia.org/wiki/Don%27t_repeat_yourself

[4]: http://www.jackdb.com/

ecopoesis · on July 31, 2013

I'm not convinced ORMs ever have a place. Using SQL do do CRUD-type things just isn't that hard, and using the raw SQL is generally much clearer.

I think the Anorm developers say this best: "SQL is already the best DSL for accessing relational databases. We don’t need to invent something new." http://www.playframework.com/documentation/2.1.1/ScalaAnorm

dustingetz · on July 31, 2013

meh, i have used anorm. their site jabs at ORMs and jabs at Slick (typesafe SQL dsl in scala); but the wording of both jabs make it clear that they've missed the point of both tools. Slick is not about types - its about adding one-off functions to remove all the repition from complex SQL - and ORM isn't about hiding SQL under the hood, it's about expressing very complex queries at a higher level than is possible in SQL. Just because someone wrote it on the internet does not make it so.

If your app is simple CRUD on 10 tables, anorm is great. But even for CRUD on complex types (nested objects, objects with collections of objects, objects with values that aren't primitives e.g. dictionary types where possible values aren't known until runtime...) those queries will be crazy complex and repetitive in raw SQL and you will quickly see you need an orm (or move off sql to some other acid alternative like datomic). Anorm would be an ideal layer to write your own ORM in though.

asdasf · on July 31, 2013

That seems totally backwards. If your app is a simple CRUD app, then ORMs save you time. For complex apps, ORMs generally can't even express what you want, or can only do so in incredibly inefficient ways, so you end up dropping down to SQL anyways.

dustingetz · on July 31, 2013

"ORMs generally can't even express what you want, or can only do so in incredibly inefficient ways, so you end up dropping down to SQL anyways"

on the project/team i work on, this statement is false. If it's true for some other team, maybe they are "doing it wrong". I'm sure it depends on one's particular ORM implementation, and whether it is any good, and what it's design goals are.

asdasf · on July 31, 2013

What ORM are you using? I've never seen or heard of one that people didn't have to drop down to sql with (or a pseudo SQL language that is converted directly to SQL).

dustingetz · on July 31, 2013

a custom one actually, and it does have tons of warts and i'd rather be using datomic, but it does an exceptional job of expressing high level queries that expand into a mess of nested SQL expressions.

atwebb · on July 31, 2013

I used to be of this train of thought. It took a long while and a few different options. I'm now of the opinion that it's like everything else I encounter, sometimes it has a place.

I don't like the large ORMs, at all, though. I've used a few Micros (C#) and settled into PetaPoco for a few tasks which are CRUD heavy to allow internal users to maintain some data and it has saved me a lot of time.

I can write the hell out of some SQL, but sometimes I don't need to be doing it, it's just dead simple stuff. There can be concerns for people currently unfamiliar with SQL which I understand.

kazagistar · on July 31, 2013

Yes, but that is exactly the point GP was making: ORM's are a DSL for automatically generating CRUD-type applications, that happen to be a passable language for accessing relational databases. Meanwhile, SQL is a DSL for accessing relational databases, but it takes more work to make CRUD-type applications. Hence, there are use cases for both.

alexjeffrey · on July 31, 2013

Imagine how slow progress would be if for every different condition you had to either create a new query or alter an old one, with the latter potentially meaning you have to add new JOINs, adding weight to all other queries based off that string. It's very painful - I should know, this was my reality for a few months before implementing an ORM over the top.

asdasf · on July 31, 2013

If you want to abstract part of a common query, then do so. That is what views are for. Everyone trying to sell the world on ORMs always seems to make this mistake. You can't say "if you do things the worst possible way now, an ORM can help you!" and expect me to be excited. I don't do things the worst possible way, I already get the benefits you mention from my database.

dustingetz · on July 30, 2013

i think your third paragraph is more or less false for apps over a certain size, think enterprise-y stuff. it is certainly false for the class of apps that i work on - big pharma regulatory compliance software, very complex data model, hundreds of tables etc. Our ORM can express higher level queries and a wider set of queries than raw SQL - you can express a query in one line (one thought) that compiles down to quite a few nested SQL expressions. Finally, the problems of orm stem from the fundamental nature of SQL, so dropping into raw sql couldn't possibly fix them. You need something like Datomic or CQRS/ES to remove the object/relational impedance mismatch at a fundamental level. (This is analogous to why Git rocks compared to SVN; Git solves the problem of centralized mutable state at a fundamental level which opens the door for a better model of the problem and more powerful abstractions.)

taspeotis · on July 30, 2013

> big pharma regulatory compliance software

I hear that these sorts of things use EAV [1], which is traditionally something that ORMs do not handle well. The rationale of EAV over another model is that entities have MANY optional attributes and you'd have horrible data sparseness (without EAV).

But you say you use an ORM:

> Our ORM can express higher level queries and a wider set of queries than raw SQL

Are the anecdotes I hear about using EAV in these sorts of applications right or is the problem domain so big that there's room for EAV and non-EAV and nobody's wrong?

Just curious.

[1] http://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%...

siculars · on July 31, 2013

Whereas traditional EAV is a triple - entity, attribute, value - Datomic is a quad. Datomic adds a timestamp and treats every value as an immutable fact at a point in time. Which is exceptionally valuable for highly regulated systems like finance, pharma and healthcare.

taspeotis · on July 31, 2013

I will look at this EAVT system a bit more closely. Thank you.

dustingetz · on July 30, 2013

yeah, Datomic is EAV (technically EAVT, it's event sourced so there is a notion of time and reads/writes need be acid with transactions and stuff) and EAVT can express even higher level queries than an ORM can, mostly because you can cache the indexes locally so you can do consistent read queries in your app process (Datomic's query engine is a library that runs inside your app, if cache is hot reads don't touch network... like git). From my understanding it doesn't have much to do with sparseness, though you certainly can support sparse objects performantly with EAVT.

EAVT/Datomic is a better fit for apps with complex structured data than an ORM over SQL, but migrating there is no trivial feat. Nor is convincing my customers in their due diligence phase, as they have surely been burned before by some kid pushing MongoDB, but they understand SQL and know the product can be successful with SQL, they know they can go get Oracle consultants to save their ass in 10 years when my company is sold, etc.

taspeotis · on July 31, 2013

I appreciate the explanation.

wglb · on July 31, 2013

With regard to prepared statements, If you use prepared statements across the board you don't even have to bother with sanitizing user input

This is not always the case. Consider an application that presents a table, and that the application can sort on any of the columns presented. Perhaps there is a dropdown, listing all of the column names.

Column names not parameterizeable in prepared statements. Thus, you will need to do input sanitization in this case.

sehrope · on July 31, 2013

> Column names not parameterizeable in prepared statements.

You should still use prepared statements for these types of queries as more than likely they will have parameters as well. You might end up with a couple more SQL statements in you statement cache but that should be a non-issue and if anything be faster, not slower.

> Thus, you will need to do input sanitization in this case.

If you're dynamically sorting based on user input then you shouldn't sanitize the input, you should white list it. If you're using an RDBMS then there's a fixed, finite, set of columns the user can sort on that you can check against.

Sanitizing fields for sorting (or even querying) is more important for NoSQL databases as the interface is programmatic. MongoDB is a good (bad?) example as it interprets strings as commands so; you have to go out of your way to escape inputs even though you're programmatically using them as field names.

wglb · on July 31, 2013

you shouldn't sanitize the input, you should white list it

I think of sanitizing user input as including white listing.

jroseattle · on July 30, 2013

Nice article, but this is training wheels. Maybe it's inferred, but this article should be pointed at the rookie developer.

bigiain · on July 30, 2013

Troy writes largely for a less-technical-than-HN-commenters audience. You're unlikely to see to many "Wow, there's something I didn't know" comments here in response to his articles – but I'll bet there's many many heads nodding and thinking "I wish I'd had this article to show a certain cow-orker before he did $newbie_mistake[0]"

ams6110 · on July 30, 2013

Or, let's be honest: I wish I'd read this article before I learned this stuff the hard way.

jroseattle · on July 31, 2013

Yep, certainly didn't mean to imply it wasn't useful, just where the focus should be.

bcoates · on July 30, 2013

"Don't be that guy who has SQL injection issues" is rookie advice but mistaken ideas about what SQL injections can do are widespread (probably #2 common security misconception after "hash your passwords").

The blind timing-based information leak is pretty clever and makes a nice counterexample to a lot of wrong common-knowledge.

glhaynes · on July 30, 2013

I learned long ago to sanitize my inputs/etc so I don't think I'll write more secure code for having read it, but I hadn't ever thought through exactly how an attacker might do things like time-based blind SQL injection. So I found it interesting.

jeltz · on July 31, 2013

You should whenever you can always parametrized queries over sanitizing the input. Manually escaping or sanitizing should be the last resort.

twistedpair · on July 31, 2013

You've got to do both chief. Sure, you won't get SQL Injected with your parameterized queries, but you'll persist content that is damaging to your users. XSS is the example case where malicious content is persisted to the database and returned to the UI as trusted content.

That being said, this is not necessarily an issue with the DB tier as it is with the data input tier.

jroseattle · on July 31, 2013

Bingo. A quick summary of the article would be "if you allow external input to pass through as executable values, this can happen". Really, the article is exploring different manners of executing SQL statements against a database, and really just looking at many possible SQL scenarios. The primary culprit in every scenario is the input tier.

andyroid · on Aug 2, 2013

That's what output sanitizing is for. What's filtered for output may easily be changed over time whereas data that's been filtered when input could be enormously difficult to change once the application requirements changes.

PavlovsCat · on July 31, 2013

Manually escaping or sanitizing should be the last resort.

Why, other than "because one might forget it"?

That is, what if bits of the query depend on various conditions and need to be constructed part by part? Keeping the placeholders and the actual values synced seems like a nightmare in those cases.. unless I am missing something.

ygra · on July 31, 2013

One could imagine something like

    public static void AddParameterValue(this SqlCommand cmd, string paramName, object value)
    {
        if (cmd.CommandText.Contains("@" + paramName))
            cmd.Parameters.AddWithValue(paramName, value);
    }

to help set parameters only if they are referenced in the SQL statement.

magoon · on July 30, 2013

Which article is for intermediate and advanced?

dmix · on July 31, 2013

http://www.amazon.com/SQL-Injection-Attacks-Defense-ebook/dp...

300bps · on July 31, 2013

Anything that teaches developers about security is a good thing including his article. One minor nit - ASP.NET has been mostly shielded from these types of things for 6 or 7 years. He used ASP.NET in his examples which meant he purposely turned off several protections to get the examples to work.

jackmaney · on July 31, 2013

';DROP TABLE articles;--

twistedpair · on July 31, 2013

There was a chance it might work.