When you search for "${", the page is missing 26 lines of minified JavaScript (lines 9-35 of a non-broken page, at least for me), almost certainly because of a templating bug. These lines, among other things, are responsible for adding the top toolbar to the page. (The missing JS is here: http://pastebin.com/B9cy3T2c)
In my experience it's pretty rare for template language bugs to cause errors if user entered content includes one of their special characters. The template would have to be evaluated twice for any problems to occur - once to insert the user's template code in to placeholders within the original template , and then once again to execute the resulting combination.
I disagree! It does not make sense that submitting any text via a form input should in any way interfere with a templating engine, in the same way we dont expect to be able to affect a database by entering SQL into a form field.
The fact that Google brings back an empty result set to me indicates the problem is a bit deeper...
I did a diff of the source code for a SERP for "${" and a SERP for "$$", ignored any lines that were the same except for s/${/$$/, and then un-minified with http://jsbeautifier.org/
Tip to the poster, and to anyone: Google (and Facebook, and others) have bug bounty programs. You can get paid tens to thousands of dollars if you report vulns to the vendor first.
The base reward for qualifying bugs is $500. If the rewards panel finds a particular bug to be severe or unusually clever, rewards of up to $3,133.7 may be issued.
For random XSS-style things on web pages, that's a high bounty.
The crazy lucrative bugs you may have heard of tend to be drive-by remote code execution in popular clientsides (like IE or Flash), and the stories about valuations tend to be apocryphal.
So, that gives you your BATNA for negotiations with the dark side...
Either google is very confident that they don't have serious bugs or they are setting themselves up for a problem. Imagine the value of finding a serious bug in adsense or adwords.
For me just typing ${ breaks the layout. I agree it probably has something to do with a template engine. I know Java EL uses the syntax ${variable_name} and so does Velocity Templates.
Guys, (and I don't mean Google, I mean all of us), don't fix injection by plugging injection bugs; put together some framework that actually avoids all of these problems (or at least doesn't let you add bugs).
This is actually a hard problem in the general case, and it is an active area of research. One promising approach is static taint analysis, wherein the source code of a web app is analyzed to detect whether "tainted" output is given to a sensitive "sink" without being properly sanitized. See, e.g., Omer Tripp et al., "TAJ: Effective Taint Analysis of Web Applications" (PLDI 2009) (http://www.cs.tau.ac.il/~omertrip/pldi09/paper.pdf).
As an example of a difficult case, consider the following pseudocode snippet:
That's a poor example. I would never send a document as HTML without tags. html_sanitize() should really be generate_html(), which adds structure to the document.
What the GP is saying (and I agree with) is that generate_html() should use a library which understands HTML structure and only allows content to be generated using a strict API (no doc+="<foo>bar</foo>" garbage).
Such a discipline greatly reduces the chance of injections, to the point where you have to actively write code to create injection points. And it's simple to follow: any time you write HTML, use the library.
Taint analysis sounds nice in theory, but you can get the same effect by writing code modularly (i.e. only one small module can actually access the raw output stream) and using libraries to create structured data.
You don't know where "doc" is coming from from a snippet like that. I use logic moderately similar to that for my blog software. If I'm writing the blog post, pretty much pass through what I wrote. If it's a comment (back when I had them), process the heck out of it. The blog post itself is a blob of HTML, basically.
I do not currently write my blog posts in a templating language (any more than anyone else does), though Hamlet [1] has me sort of tempted as it is so close to what I write anyhow.
It's not a poor example, and it's not about sending the document without tags. It's about whether special characters should be escaped, and the answer depends on the Content-Type that the client requested.
You're making the same mistake made by people who mistake good static type systems and type inference. Yes, it would be nice to infer the correct escaping function; but forcing people to escape somehow would be sufficient.
Nice article. In response to the last part I can think of a way to achieve the sort of smart escaping via template you talk about using Haskell and Hamlet (among other templating systems used by yesod). I believe, although I can't absolutely confirm that Hamlet already performs context appropriate escaping, based mostly on the type signatures and the names of a few of the functions.
Yes, Hamlet does context specific escaping. It will handle all the examples given, except you can't mix your javascript in with your html (which is generally good advice anyways).
I disagree with the articles premise that injection is always a display issue. In the [Yesod web framework](http://www.yesodweb.com) which uses Hamlet, we sanitize, not strip html by default before it is ever put in the database. The more you can make injection not a display issue, the better- you just have to know your options.
I don't know what you mean by "90% problem", but unlike what your blog article suggests, Django's template engine escapes everything by default. You have to explicitly pass content through a filter to request that it not be escaped.
Based on the fact that the suggestions in your blog article could easily support someone forgetting the "|escape" on a variable, I would accuse your methodology of only solving the "90% problem".
Due to django's "we want the templating system be general, to be usable for stuff other than html", it can't provide support for such 'guarantee that the output is well formed / valid / has no injection attack entry points' features.
I think that Django uses sha for password hashes. They should use bcrypt, right? Did you turn on XSRF protection (which I think is off by default)? Are cookies secure?
Web security is not as simple as 's.replace("<","<")' (escaping by default).
How do you generate slugs? Could someone put something nasty in a pathname or URL?
Django is not secure. You can secure it, with minimal effort, if you keep things radically simple. But you do need to know what can go wrong, so you don't introduce any "features" that are actually "gaping security holes".
Even if you do everything right, it doesn't mean that Django is magically secure no matter what people use if for (obvious, yes, but this is HN and sometimes failing to point out the obvious can get you downvoted). That's why people are objecting.
Handling user-submitted image tags is (in my opinion) way outside the scope of the framework. Which tags and attributes to whitelist, or whether to use html markup at all compared to a different language like markdown, is very project dependent. If you have to, just install BeautifulSoup or any of the other great libraries that have cropped up in the last year or so to handle the sanitizing.
Django uses sha for password hashes because until recently there hasn't been a better library to ship with natively across all the platforms that Django supports. If you know you'll only be working on *nix, django-bcrypt can enhance the default password hashing behavior. As other commenters have noted, they're moving to PBKDF2 in the near future as a better included hashing library.
CSRF is on by default. If you need secure cookies and HSTS headers, there's a package that provides them called django-secure, which last I heard is being rolled into Django proper in the near future.
Django prevents path traversal and anything else you can imagine that might be nasty in a URL. The auto slug generation included.
So how exactly is Django not scure again? Where are the "gaping security holes"? Or do you have no idea what you're talking about.
CSRF in on by default.
Cookies could be more secure, and it's being worked on.
Django is moving to PBKDF2 (there's no pure python bcrypt lib).
There's not really opportunity to do anything interesting with slugs.
Like any framework, there will always be room to improve security, but it does do very well out of the box. At least it makes you work to expose anything obvious.
...which you can do simply by posting a link anywhere.
Edit: I guess it would be more helpful to explain why for those not familiar with XSS. If all it takes it a specially crafted URL to your site to exploit it, your site is toast. The security model of the web assumes that people can open even the shadiest of links without negative consequences. I could have obscured the URL with a shortener and named the link "Cutest cat pic ever!" I could have hosted a page on a totally separate domain and put the crafted URL in a hidden iframe. All I have to do is send document.cookie over to my server and now I control your account.
If this is a templating engine type thing, you should be able to do something like
${KEYWORD}${
If you can figure out what "KEYWORD" is for a given template tag as well. I tried links and a few others, but none that I can identify: it does still reproduce the bug though.
And i thought that this was one of the most sanitized input field in the internet.
Let's seen how long it takes to google deploy a fix on their search.
The Google Search Appliance, with roots in Google proper, generates XML results, which are then (normally) transformed via XSLT. (I'm looking at you, XPath.)
On a related note, searches for many symbols do not product results. Searching for '&' will bring up results for ampersand, but most others that map well to words do not, e.g. $ => dollar, % => percent, etc.
I would have been suspicious of a hyperlink in this context...since this is potentially discussing a XSS vulnerability on Google (not necessarily, but maybe).
Everyone seems to avoid to mention the obvious. This breaks the page layout only, not really a critical issue concerning google's integrity/security.
Still, this obviously doesn't look good. Above anything else google has excelled on being simple and reliable. All this javascript goodness added recently might be a step in the wrong direction. If stuff like this starts to happen every now and then, google's reputation might be at stake.
If your templating language is going to use a magic character it would seem useful to pick something less common than $. There are several odd characters on my keyboard (§`~±|¤) and if you are willing to use the ALT key there are really obscure characters that you can safely filter from the input instead of going through the trouble of escaping them. Filtering is so much more efficient/easier/safer than escaping.
Imagine how much easier life would be if in HTML we only had to filter for § instead of escape every <, > and ".
That's my homepage. It's been a long time since I read it.
Reading it now as a treatise from my younger self, though I didn't write it, I realise that spirit is lost. For a while it was "our" place but now we have to return to the underground.