Hacker News new | past | comments | ask | show | jobs | submit login

SQL injection attacks are an excellent example where code and data are mixed. One solution is to do a lot of clever escaping of 'attackable' characters that instruct the DBMS to stop treating a character string as data and start executing things [1]. Escaping attackable characters attempts to partition data from code. This usually works but not perfectly.

Or, run your data through stored procedures instead. It took me a while to figure out why stored procedures were so much more secure than regular queries. I finally figured out it was because a stored procedure does exactly what the grandparent post says: It treats all inputs as data with no possibility to run as code.

[1] https://xkcd.com/327/




Hmm. I'm going to have to disagree about Stored Procedures providing security. You can do all sorts of bad things using stored procedures that may result in unintended code execution!

Perhaps the most naive example: https://pastebin.com/acQqhDvy

I think they're more useful for organization and abstraction than security. Then again, a well organized and smartly abstracted system can lead to better security!

But I think bind parameters are probably a better example of security.

Binding effectively separates the data from the logic. So you define two separate types of things, and then safely join those things together by binding them. It doesn't matter too much whether that happens in the application making a call to the database or in the database in a stored procedure. Obviously this same concept can be applied at many different points along the application stack. The analogous concept in the UI is templating. You define a template and then safely inject data into that template.


> I finally figured out it was because a stored procedure does exactly what the grandparent post says: It treats all inputs as data with no possibility to run as code.

This isn't well defined. Take this pseudocode stored procedure (OK, it's a python function):

    def retrieve_relevant_data(user_input):
        if user_input == 1:
            return BACKING_STORE[5]
        elif user_input == 2:
            perform_side_effects()
            return BACKING_STORE[1]
        else:
            return "Go away."
You can provide any input to that. You could think of this as a function which "treats all input as data with no possibility to run as code" (it never calls eval!). But you could also usefully think of this as defining a tiny virtual machine with opcodes 1 and 2. If you think of it that way, you'll be forced to conclude that it does run user input as code, but the difference is in how you're labeling the function, not in what the function does.

The security gain from a stored procedure, on this analysis, is not that it won't run user input as code. It will! The security gain comes from replacing the full capability of the database ("run code on your local machine") with the smaller, whitelisted set of capabilities defined in the stored procedure.


> The security gain comes from replacing the full capability of the database ("run code on your local machine") with the smaller, whitelisted set of capabilities defined in the stored procedure.

The security gain is that it you are only able to run queries that the DBA allows you to. If you can't write arbitrary queries, you won't get arbitrary results. If you can only run a stored procedure, you are abstracted away from those side effects. Another way of saying this -- the security risk is shifted from the app developer to the DBA. Someone is still writing a query (or procedure code), so there will always be some risk.


The security gain is that it you are only able to run queries that the DBA allows you to. If you can't write arbitrary queries, you won't get arbitrary results. If you can only run a stored procedure, you are abstracted away from those side effects. Another way of saying this -- the security risk is shifted from the app developer to the DBA. Someone is still writing a query (or procedure code), so there will always be some risk.

This could also be achieved with a well written microservice/package that developers go through without depending on dba.


It doesn't sound like we disagree?


The philosophy and semantics are an interesting side issue, but I'd say the default meaning of those words is that your data, in the SQL system, is not treated as SQL code.


Parameter-ized query builders are possible in every SQL library.

String escaping SQL? How is anyone thinking that is still a thing in 2017? The problem has been solved for two decades


Not just that, but they are great for sharding too.


I'm not following you, how so?


Stored procedures are bad in so many ways - they harder to deploy and revert than code, harder to unit test* , harder to refactor and every implementation that I have ever seen that has business logic in stored procedures instead of microservices/packages/modules have been a nightmare to maintain.

* At least with .Net/Entity Framework/Linq you mock out your dbcontext and test your queries with an in memory List<>

https://msdn.microsoft.com/en-us/library/dn314429(v=vs.113)....


> harder to deploy and revert than code

Agree.

> harder to unit test

Disagree. I've implemented unit tests that connect to the normal staging instance of our database, clone the relevant parts of the schema into a throw-away namespace as temporary tables, and run the tests in that fresh namespace. About 100 lines of Perl.

That was five years ago. These days, it's even easier to do this correctly since containers allow you to quickly spin up a fresh Postgres etc. in the unit test runner.


It’s even easier and faster when you don’t have to use a database at all and mock out all of your tables with in memory lists. No code at all except your data in your lists.


> easier and faster

It also need not be correct. If you're only ever doing "SELECT * FROM $table WHERE id = ?", you're fine, but a lot of real-world queries will use RDBMS-specific syntax. For example, from the top of my head, the function "greatest()" in Postgres is called "max()" in SQLite. How is it called in your mock?

Mocking out tables with in-memory lists adds a huge amount of extra code that's specific to the test (the part that parses and executes SQL on the lists). C# has this part built in via LINQ, but most other languages don't.

By the way, I see no practical difference between "in-memory lists" and SQLite, which is what I'm currently using for tests of RDBMS-using components, except for the fact that SQLite is much more well tested than $random_SQL_mocking_library (except, maybe, LINQ).


You are correct, if I were doing unit testing with any other language besides C#, my entire argument with respect to not using a DB would be moot. But I would still rather have a module/service to enforce some type of sanity on database access.

The way that Linq works and the fact that it’s actually compiled to expression trees at compile time and that the provider translates that to the destination at runtime whether it be database specific SQL, MongoQueries or C#/IL, does make this type of testing possible.


Yeah, I thought the same thing until I found a colleague who was very fond of calling exec_sql in stored procedures, with the argument being a concatenation of the sp arguments.


I think you mean parameterised queries. Stored procedures are a slightly different thing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: