Hacker News new | past | comments | ask | show | jobs | submit login

> if html used length-prefixed tags rather than open/close tags most injection attacks would go away immediately.

That's not really the problem. The problem is there is no distinction between data and control leading to everything coming to you in one binary stream. If the control aspect would be out-of-band then the problem would really go away.

Length prefixes will just turn into one more thing to overwrite or intercept and change. That's much harder to do when you can't get at the control channel but just at the data channel. Many old school protocols worked like this.




Thank you.

This is the important takeaway here. Changing the encoding simply swaps out one set of vulnerabilities and attacks for another. Separating control flow and data is the actual silver bullet for this category of attacks.

Unfortunately, there’s rarely ever a totally clear logical separation between the two. Anything you want to bucket into “control”, someone else is going to want the client to be able to manipulate as data.


I'm having a hard time seeing how having separate control and data streams would have an effect here. Using FTP to retrieve a document isn't more secure than HTTP... the problem is in how the document itself is parsed. If you added a separate side channel for requesting data (a la FTP), you'd still have the issue of parsing the HTML on the other side.

Granted, if you made that control channel stateful, you'd make a lot of problems go away. But you could do that with a combined control/data stream too.

What am I missing? How would an out-of-band control channel make things easier?

That said, I think many issues with the web could be solved by implementing new protocols as opposed to shoehorning everything into HTTP just to avoid a firewall...


It makes sure that all your code is yours and that no matter what stuff makes it into the data stream it will never be able to do anything because it is just meant to be rendered.

So <html>abc</html> would go as

<html><datum 1></html> where datum 1 would refer to the first datum in the data stream, being 'abc' and no matter what trickery you'd pull to try to put another tag or executable bit or other such nonsense in the datum it would never be interpreted. This blocks any and all attacks based on being able to trick the server or eventual recipient browser of the two streams to do something active with the datum, it can only be passive data by definition.

For comparison take DTMF, which is inband signalling and so easily spoofed (and with the 'bluebox' additional tones may be generated that unlock interesting capabilities in systems on the line) and compare with GSM which does all its signaling out-of-band, and so is much harder to spoof.

The web is basically like DTMF, if you can enter data into a form and that data is spit back out again in some web page to be rendered by the browser later on you have a vector to inject something malicious and it will take a very well thought out sanitation process to get rid of all the possibilities in which you might do that.

If the web were more like GSM you could sit there and inject data in to the data channel until the cows came home but it would never ever lead to a security issue.

No amount of extra encoding and checks will ever close these holes completely as long as the data stays 'in band' with the control information.


I guess what I'm getting at is that it isn't HTTP that's the issue -- it's HTML. I'm all for a control channel in HTTP. But you're still stuck parsing <html><datum_1></html>, and it is difficult to think about reorganizing each tag as a separate datum. At what level do you stop converting the data into separately requestable bits? How would you even code it? And making the tags themselves length-prefixed (like csexp's) wouldn't entirely solve the problem.

I could easily see making <script> and <link> resources required to be separately requested (like images are now -- ignoring data/base64 resources), but we're back to redefining HTML.

I'm not arguing against that...

It's really hard to have these types of debates though, because everyone focuses on different problems of the HTTP/HTML webapp request/response cycle. Like you said, adding separate control/data channels would help, but that doesn't solve SQL injection attacks (which is a whole other class, but that's not really an HTTP/HTML issue, it's a backend issue and I don't see how you'd avoid that with a simple protocol change). Simply making HTTP stateful could potentially solve a different class of session highjacking, etc...

There are so many attack vectors that I think it does make sense to think about what a replacement for HTTP/HTML would look like. Most of these problems arise from trying to re-engineer a document format (HTML) to support interactive webapps. We should think about how to do this better... (without recreating ActiveX -- shudder).


> I could easily see making <script> and <link> resources required to be separately requested (like images are now -- ignoring data/base64 resources), but we're back to redefining HTML.

This has been implemented in HTTP (not HTML); you can enable the requirement right now by serving your pages with an appropriate Content-Security-Policy header.


Or, e.g. my preferred encoding of HTML:

    (html "abc")
This guarantees that no matter what is inside "abc" it simply can't escape into the control stream:

    (html "This is not (malicious \"boo\")")
This is just a pretty display of what would actually be these bytes:

    (4:html29:This is not (malicious "boo"))
It doesn't matter what one puts in the atom: it can't escape and damage the control stream.


The two are very different:

    (html "user content")

    user content := " (script "something malicious")"

    (html "" (script "something malicious"))
the length-prefixed version cannot escape in this way.


SQL injection attacks are an excellent example where code and data are mixed. One solution is to do a lot of clever escaping of 'attackable' characters that instruct the DBMS to stop treating a character string as data and start executing things [1]. Escaping attackable characters attempts to partition data from code. This usually works but not perfectly.

Or, run your data through stored procedures instead. It took me a while to figure out why stored procedures were so much more secure than regular queries. I finally figured out it was because a stored procedure does exactly what the grandparent post says: It treats all inputs as data with no possibility to run as code.

[1] https://xkcd.com/327/


Hmm. I'm going to have to disagree about Stored Procedures providing security. You can do all sorts of bad things using stored procedures that may result in unintended code execution!

Perhaps the most naive example: https://pastebin.com/acQqhDvy

I think they're more useful for organization and abstraction than security. Then again, a well organized and smartly abstracted system can lead to better security!

But I think bind parameters are probably a better example of security.

Binding effectively separates the data from the logic. So you define two separate types of things, and then safely join those things together by binding them. It doesn't matter too much whether that happens in the application making a call to the database or in the database in a stored procedure. Obviously this same concept can be applied at many different points along the application stack. The analogous concept in the UI is templating. You define a template and then safely inject data into that template.


> I finally figured out it was because a stored procedure does exactly what the grandparent post says: It treats all inputs as data with no possibility to run as code.

This isn't well defined. Take this pseudocode stored procedure (OK, it's a python function):

    def retrieve_relevant_data(user_input):
        if user_input == 1:
            return BACKING_STORE[5]
        elif user_input == 2:
            perform_side_effects()
            return BACKING_STORE[1]
        else:
            return "Go away."
You can provide any input to that. You could think of this as a function which "treats all input as data with no possibility to run as code" (it never calls eval!). But you could also usefully think of this as defining a tiny virtual machine with opcodes 1 and 2. If you think of it that way, you'll be forced to conclude that it does run user input as code, but the difference is in how you're labeling the function, not in what the function does.

The security gain from a stored procedure, on this analysis, is not that it won't run user input as code. It will! The security gain comes from replacing the full capability of the database ("run code on your local machine") with the smaller, whitelisted set of capabilities defined in the stored procedure.


> The security gain comes from replacing the full capability of the database ("run code on your local machine") with the smaller, whitelisted set of capabilities defined in the stored procedure.

The security gain is that it you are only able to run queries that the DBA allows you to. If you can't write arbitrary queries, you won't get arbitrary results. If you can only run a stored procedure, you are abstracted away from those side effects. Another way of saying this -- the security risk is shifted from the app developer to the DBA. Someone is still writing a query (or procedure code), so there will always be some risk.


The security gain is that it you are only able to run queries that the DBA allows you to. If you can't write arbitrary queries, you won't get arbitrary results. If you can only run a stored procedure, you are abstracted away from those side effects. Another way of saying this -- the security risk is shifted from the app developer to the DBA. Someone is still writing a query (or procedure code), so there will always be some risk.

This could also be achieved with a well written microservice/package that developers go through without depending on dba.


It doesn't sound like we disagree?


The philosophy and semantics are an interesting side issue, but I'd say the default meaning of those words is that your data, in the SQL system, is not treated as SQL code.


Parameter-ized query builders are possible in every SQL library.

String escaping SQL? How is anyone thinking that is still a thing in 2017? The problem has been solved for two decades


Not just that, but they are great for sharding too.


I'm not following you, how so?


Stored procedures are bad in so many ways - they harder to deploy and revert than code, harder to unit test* , harder to refactor and every implementation that I have ever seen that has business logic in stored procedures instead of microservices/packages/modules have been a nightmare to maintain.

* At least with .Net/Entity Framework/Linq you mock out your dbcontext and test your queries with an in memory List<>

https://msdn.microsoft.com/en-us/library/dn314429(v=vs.113)....


> harder to deploy and revert than code

Agree.

> harder to unit test

Disagree. I've implemented unit tests that connect to the normal staging instance of our database, clone the relevant parts of the schema into a throw-away namespace as temporary tables, and run the tests in that fresh namespace. About 100 lines of Perl.

That was five years ago. These days, it's even easier to do this correctly since containers allow you to quickly spin up a fresh Postgres etc. in the unit test runner.


It’s even easier and faster when you don’t have to use a database at all and mock out all of your tables with in memory lists. No code at all except your data in your lists.


> easier and faster

It also need not be correct. If you're only ever doing "SELECT * FROM $table WHERE id = ?", you're fine, but a lot of real-world queries will use RDBMS-specific syntax. For example, from the top of my head, the function "greatest()" in Postgres is called "max()" in SQLite. How is it called in your mock?

Mocking out tables with in-memory lists adds a huge amount of extra code that's specific to the test (the part that parses and executes SQL on the lists). C# has this part built in via LINQ, but most other languages don't.

By the way, I see no practical difference between "in-memory lists" and SQLite, which is what I'm currently using for tests of RDBMS-using components, except for the fact that SQLite is much more well tested than $random_SQL_mocking_library (except, maybe, LINQ).


You are correct, if I were doing unit testing with any other language besides C#, my entire argument with respect to not using a DB would be moot. But I would still rather have a module/service to enforce some type of sanity on database access.

The way that Linq works and the fact that it’s actually compiled to expression trees at compile time and that the provider translates that to the destination at runtime whether it be database specific SQL, MongoQueries or C#/IL, does make this type of testing possible.


Yeah, I thought the same thing until I found a colleague who was very fond of calling exec_sql in stored procedures, with the argument being a concatenation of the sp arguments.


I think you mean parameterised queries. Stored procedures are a slightly different thing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: