Hacker News new | past | comments | ask | show | jobs | submit login

It's an "internet tech" bug. Every developper should know how hard it is to parse textual data vs well defined binary in a secure and fool proof way. Yet every damn internet piece of infrastructure is based on handling textual data, mash it up, pass it around, escape and unescape it in hundreds of stupid formats. No wonder that most security troubles surfacing over the years are some form of abuse of this crazy design flaw: buffer overruns, sql injection, the openssl bug a few month back and now this. Let's go back to sanity and use well defined binary protocols where there is no damn way to send a command by text but only very explicit semantics, and stop the unix way of thinking that text should be more than a human interface. Text should never be used as a command language in between complex programs. Period.



I really don't think making protocols less understandable by humans will solve anything.

This has always been and always will be a hard problem. Consider this quote which I found in The Shellcoder's Handbook:

"Wherever terms have a shifting meaning, independent sets of considerations are liable to become complicated together, and reasonings and results are frequently falsified." -- Ada Lovelace

We've known about this since literally the beginning; we'll be cursing ourselves over it until the very end. Vulnerabilities are going nowhere.


It's not about making protocols less understandable by humain, it's about recognising that we are programming computers, not humans and that it's time we accept that there should certainly be a way for the humans to interact with the program at some point we should not force the same kind of interactions on the programs themselves. It's much harder to make text and text based commands' parsing and handling secure than it is to use binary protocols in the first place.


I just disagree. You seem to be saying we should use binary protocols and load them directly into memory Cap'n Proto style. But what if I'm little endian and you're big endian? Parsing happens. Text-based protocols fit well into human's heads, and its the humans that have to do the debugging. I think it would only make the process of finding bugs slower and more complex, and give advantage to the attackers.


Please describe specifically how would that have helped.

This has nothing to do with parsing text. The problem here is that Apache et all send untrusted data to a process that treats it as code. It wouldn't matter if HTTP was a binary protocol and if bash read a well defined bytecode instead. I mean, look at shellcodes.


You are describing the problem exactly: it all too tempting to pass text around from user input to command line arguments without any way to validate the text data and assume it's ok because it's easy. It's exactly the same arguments that goes in between static and dynamic typing in programming languages: static typing ensures some sort of semantics is respected. If you pass text around, because it's easy and fast, most of the time you will never validate the data and you have no way to ensure that you are not actually handling a bomb. If the protocol was binary there is no way in hell you would be tempted to pass it's data without validation to an external program because you'd have to respect the API and because there would be not way to just send a bunch of commands. The same goes for sql injections, url buffer overflows, etc. Free form text should only be used for actual human textual data and should NEVER be the interface in between programs. It's way too fuzzily defined to serve as a protocol.


If the protocol was binary there is no way in hell you would be tempted to pass it's data without validation to an external program because you'd have to respect the API and because there would be not way to just send a bunch of commands

I assume you're talking about Apache - but Apache had no way of validating the data. The protocol just said "this is a blob from the client", which any binary protocol for the task must be able to handle. Apache had no business validating it, anymore than it should validate any other content - how should it know what makes it valid?

Bash, on the other hand, just received that blob and treated it as an executable. It wouldn't matter if the protocol between the server and bash was binary, since it was a valid value as far as the protocol was concerned.

The problem here is the hidden channel between Apache and bash, which never actually talk directly to each other (it's through the CGI binary) but still pass data. It has nothing to do with text protocols.


no, the problem is that you can treat any kind of text data as an executable. You can try to fix this by adding mountains of complexities and excuses but would still be true: as soon as you have text enter the equation you need to escape/encode/decode and parse. Every time you do that you add more complexity than is needed, and also you add many ways to abuse the programs and create "interesting bugs".


I can craft malicious binary data just as easily to execute a function if you execute binaries that begin with a few magic bytes when you're reading input into a buffer.

You seem to be relying on some assumption that you have about human psychology for your security gain. Somehow people would never do that with a binary protocol, and text protocols make them more comfortable and trusting. At least they can read text protocols directly; binary protocols involve me trusting a bunch of middleware I'm using to read them, too, or writing my own (always great for security.)


no, I rely on the fact that any version of an "eval" function should just no exist and that any text based protocol encourages the existence of such functions that can execute whatever is thrown at them, just because it sounds so easy and a quick shortcut in API design.


If the protocol was binary, exporting variables to subprocesses and exporting functions to subprocesses would go in different places, and Apache would know to send the one but not the other


How, if the protocol in question - the environment variables - has no concept of functions?

The matter is that Apache and the protocols (HTTP and environment vars) are just being used as a tunnel between the attacker and bash. They can't pass functions via another channel because they don't know what functions are. All they know is they're passing blobs of data - which any protocol would do, binary or not.

Bash happens to recognize a text value as functions, but it could just as easily recognize the magic value of an ELF binary and execute that, or any other binary format used to encode functions.


The problem is that Bash is using the same channel for two quite different things - values and functions. It's doing that because the channel is a string; if there were a proper protocol for passing environment to subprocesses, that protocol would make a distinction between the two.


if there were a proper protocol for passing environment to subprocesses, that protocol would make a distinction between the two.

TCP is a binary protocol, how does it distinguish between executable and plain text formats? Answer: it doesn't, because TCP doesn't know or care about that, that's left to the layers above to handle.

Likewise, environment variables don't know or care about "functions", that's a concept that doesn't enter into the protocol, since it's not a shell specific protocol. All it transmits are keys and values, which are generic blobs of data. That bash uses the protocol to transmit code mixed up with data is no more the protocol's fault than the fact that TCP was used to transmit those same functions on HTTP requests.


In principle, a distinguishing protocol could be embedded within the undistinguished one. If the actual environment variables were preceded with a sequence indicating the type of the contents in all cases, this would not be an issue.


You're getting close, but you missed the essence. It's not about text vs. binary. It's also not about "well-documented" vs. "ad-hoc". It's about preserving semantics.

"The Unix Way" means throwing away all semantic data - passing plain strings with no context, which are then parsed and re-parsed in an completely ad-hoc manner, usually with regexp-based shotgun parsers.

Note how SQL injections, or XSS attacks are prevented - people stopped stiching strings together and started generating proper instructions through code. User input is sanitized and driven through process that converts it from untrusted string to trusted data structure. Typing SQL queries in a semantic-aware system looks almost the same as stiching strings (thanks to SQL being flat), but now you can't possibly SQL-inject yourself.

So in general: stick to text formats or not, but whatever you do, never glue data structures using tools that work on data medium layer, that are not aware of the structure and meaning of data they are operating on. E.g. never glue strings to build SQL queries or HTML code.


It's not about text vs binary but about well-specified, documented and understood, versus ad-hoc, convention-based, "infinitely extensible", and taught-through-blogs.


All very well said, but until tools like Protocol Buffers became popular the tooling for working with custom binary protocols has been pretty dire.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: