Hacker News new | past | comments | ask | show | jobs | submit login
Why does Google prepend while(1); to their JSON responses? (stackoverflow.com)
598 points by vikas0380 on May 6, 2017 | hide | past | favorite | 110 comments



I believe this hack (JSON Hijacking) was discovered by Jeremiah Grossman in 2005[1].

It's fascinating to read how he discovered it and how quickly Google responded.

[1] - http://blog.jeremiahgrossman.com/2006/01/advanced-web-attack...


Why don't browsers strip cookies when they are doing cross domain javascript fetches?


Lack of focus, despite many years of research, literature, and attempts; interference with problematic techniques that have become really popular when alternatives sucked, like JSONP before CORS was ready, and before C-S-P was even thought of; worry about touching parts of the platform that have essentially been unchanged since the beginning vs. those parts that are fairly new and have in turn evolved quicker.

On subject of the new SameSite cookie, I wrote a post that summarized my views [1]; it doesn't make for good quoting, but I briefly recount the history of CSRF and how its mainstream knowledge came around 2006-2008, some 5 years after the first sources that mention mitigating against it -- but a 2008 academic paper on it credits "(...) Chris Shiflett and Jeremiah Grossman for tirelessly working to educate developers about CSRF attacks (...)" -- Shiflett being same person who first wrote about this in 2003, and Grossman the one who discovered this flaw in Gmail in 2006.

[1] https://news.ycombinator.com/item?id=13691022


There is a newish cookie flag called samesite to do exactly this. Chrome is the only browser to support it though.


I read about this recently. It's hard to believe these cookies didn't exist until 2016.

The biggest problem solved by cookies has always been sessions. samesite is sufficient for most sessions. It seems like samesite should have been the default from the beginning.


Because that's the way internet works and breaking it means breaking a lot of websites. Web security wasn't thought carefully when web was built, it's just a bunch of dirty hacks around most obvious vulnerabilities.


It would be easy to make sending credentials opt-in in a new HTTP or HTML version. The way it's done now is backwards IMHO.

Define httpsb:// do be like https://, but any site may make ajax and similar requests to it (without credentials). Then make some kind of exception (like csrf protection), or use legacy https, in case you need to send cookies.


But an attacker would simply use <script src="https://..", instead of <script src="httpsb://.." ?


Only if that is supported by the site being attacked. If the site only accepts httpsb connections, then the attacker would not have a way in.


If the site accepts httpsb it can as well support the Origin header [0] and the problem is solved.

[0]: https://wiki.mozilla.org/Security/Origin


The whole point is to allow any site to access any other site, just like plain TCP sockets, without stealing your cookies.

If the site wants to access google.com with its own cookies, fine, why not?


Could you elaborate on the "stealing your cookies" part?

Cookies are sent only to the origin that set them and (except XSS attacks) are not revealed to anyone else. So who exactly is stealing them?


Well, currently, nothing. But currently, the web is completely broken.

If you want web-applications to be powerful, and open, you also need to be able to have any web application to access any URL.

Why should only mail.google.com be able to access my emails, and not also my-little-opensource-webmail.com ?

To faciliate that, without also adding cookie stealing back in, you need to allow any website to open standard TCP sockets.


I proposed a header instead of a protocol btw

https://medium.com/@homakov/request-for-a-new-header-state-o...


Sounds good but I suspect it will meet the same fate as XHTML 2: designed to be clean and perfect but in reality it would take to much effort to implement and maintain.

From your professional experience you can probably tell people would rather have slightly insecure site that works and gives profits rather than broken one because SOTA started including some new feature you didn't know...

People would rather enable these individual headers one by one and see their effect. In h2 headers are compressed so it's not a big deal (besides looking ugly).


> SOTA started including some new feature you didn't know

if you sign for 2 versions, changes in 3 would not brake you. and the point is MANY things right now could be safe to turn on for 99.99%, e.g. XFO. So, not much effort


I guess this is what you get if you let an advertisement company define the web.


Because then they end up with a bug in how they do it, and oops.

When developing web applications, you must approach this from the perspective of "what is the oldest, least-secure, most bug-riddled pile of C++ and plugins someone could try to hit this with".

If you want an example of why this has to be the approach, well... six years ago the Django security team got an email from the Rails security team. Turned out something we'd both done in our CSRF protection systems didn't actually work. Protecting against CSRF while allowing XMLHttpRequest (remember this is 2011!) is kind of tricky, and the standard approach was one adopted by a lot of JavaScript toolkits: they'd set a consistent custom header (X-Requested-With) on the request. And since browsers only allowed that to be done on requests which obeyed the same-origin sandbox, it was reliable: you knew if you saw that header, it was an XMLHttpRequest that a browser had vetted for same-origin safety (or that it was someone faking a request outside of a browser, but that's not a CSRF vector).

And then it turned out that thanks to a bug in Flash plus the way browsers handled a certain obscure HTTP status code, you could actually set that header on a request to any domain. Oops, that's a complete CSRF bypass in Rails, Django and I don't even remember how many other things.

That's how we learned that particular lesson about trusting browsers to do the right thing, and I don't see myself ever trusting browser security like that again.


I'd say it's because of advertising mostly, but a lot of similar tech (that is usually ad supported) like Disqus.

It's interesting that today cross-domain sandboxing applies to almost everything except JavaScript. If I load an image cross domain and draw it into a canvas, the contents of that canvas are sandboxes, but I can cheerfully mix and match code across domains too.

Seems like it would be a good thing to do but it would break a ton of stuff.


Having advertisers not tracking you seems like a benefit not a con.


I agree, but of the four major browsers, two are directly underwritten by advertising (Chrome and Mozilla) and Microsoft is moving that way.

Only Apple has backed off advertising as a revenue source, so it basically comes down to Apple being willing to cause massive breakage (the way it did with Flash) in pursuit of a principle. The fact that they enabled ad blockers in mobile safari says they are at least sympathetic to the idea.


Did you mean Mozilla is an ad driven company?


A very large percentage of Mozilla's revenue comes from search engines (recently Yahoo, previously Google) who pay Mozilla to make themselves the default search engine on Firefox. If Firefox users saw no ads and were untrackable, Yahoo would have no reason to pay anymore.

Of course Mozilla doesn't try to force everyone to stick to the defaults, so you're free to change the default search engine and install a bunch of ad-blocking, anti-tracking add-ons.


A benefit for us; a con for those developing or sponsoring the browsers we use.


Isn't that what Safari does with the "Allow from current website only" setting? It defaults to "Allow from websites I visit", which means that only embedded content from sites you've visited before get their cookies, not random new embeds)


Interesting. Dos that mean that trackers like doubkeclick don't work on Safari with the default settings?


Wasn't a while ago news that advertisers were using a hack to bypass this protection in safari, and that caused a bit of uproar?


It's not the same but aren't the httpOnly cookies kind of serve the same purpose? JS can't read these cookies at all?


JS can't (that protects against stealing the token) but the server still receives it even when the request originates from foreign domain. That's the gist of CSRF [0].

[0]: https://en.wikipedia.org/wiki/Cross-site_request_forgery


Because not all websites can federate their users


Can I turn this off?


I wondered the same thing years ago. I always thought that browsers would have implemented other security measures so that websites avoid doing this.

Around 90 something percent of websites I visit don't implement that `for(;;)` or `while(1)` solution.

So are we saying that they're vulnerable sites?


No, they’re not vulnerable; browsers fixed this bug a long time ago.


>So are we saying that they're vulnerable sites?

We are saying that they're vulnerable for THAT particular issue (the JSON hijacking), and that is only if they don't already have some other way of dealing with it.


> So are we saying that they're vulnerable sites?

Not necessarily, if all their API responses are top-level JSON objects.


The root object has to be an array I believe.


I had a hunch that this is to prevent people from including the resource in a script tag - but I always wondered how they'd access the data as a JSON expression on its own should technically be a no-op when interpreted as JS (or so I thought).

The overridden array constructor was the missing link.

Though couldn't you have it easier by making sure your top-level JSON structure is always an object?

As far as I know, while a standalone array expression []; is a valid JS statement, a standalone object expression {}; is not and would produce a syntax error.


Someone had the same question as you in a comment.

>Wouldn't returning an object containing the array, instead of the array directly, also solve the problem?

And someone else replied

>No, that wouldn't solve the problem since the same attacks mentioned in the post could still be performed. Overriding the accessor methods to retrieve the info.


Except I don't think a JSON object is valid Javascript by itself.


What about a JSON object do you think is invalid Javascript?


It's a valid subset of Javascript (or at least was initially meant to be), but a JSON object (as opposed to an array) isn't a valid stand-alone Javascript expression, and an attempt to eval it will return an error:

    > eval('{"key": "value"}');
    SyntaxError: missing ; before statement
You can get around this by using parens:

    > eval('({"key": "value"})');
    Object { key: "value" }
But accidentally including a JSON URL in a script tag should fail to evaluate if there's an object at the top level.


All JSON literals are valid JS expressions but not all are valid statements. The two are different parts of the JS grammar.

As script blocks expect one or more statements to execute, the hack relies on the fact that some JSON (array literals) also happen to be valid statements in addition to expressions.


While the person you're responding to might genuinely be confused about JSON usually being valid JavaScript, arbitrary JSON isn't guaranteed to be: http://timelessrepo.com/json-isnt-a-javascript-subset


> Though couldn't you have it easier by making sure your top-level JSON structure is always an object?

Yes, this works too. You’re correct about any keys causing a syntax error.


.NET does this by wrapping things with {'d': data}, I always thought this was the reason.


That is one weird array in Google's reply. Looks like it could have been an object instead, whereby JSON hijacking wouldn't be a problem.


I feel like the browser could use the Content-Type header to check whether the response is JSON or actual executable javascript - throwing an error if the former


I haven't worked with JSON like that before. Do JSON parsers properly ignore the stuff Google puts in, or do you have to strip it out before parsing?


In the very first stack overflow answer:

  > an AJAX request at mail.google.com
  > will have full access to the text content,
  > and can strip it away.


Google use cookies to authenticate API requests?


Not sure if they do, but why not? It's just an opaque token in the HTTP request content, same as any other opaque token.


Pretty sure browsers no longer permit overriding ctors for literals.


The John Resig post linked by TFA indicates that this was originally dealt with by locking the constructor altogether:

   function Array(){
     alert("hello, I found something of yours!");
   }
   // ERROR: redeclaration of const Array
But it appears that the restriction now applies only to literals, as I can do this in at least Chrome and Firefox:

    function Array() {console.log("hope")}
    undefined
    var x = new Array(3);
    hope
    undefined
https://johnresig.com/blog/re-securing-json/


Everytime I read about such constructs, it makes me realize, as a regular developer, how complex web application security is and how difficult it is to think about and cover your application against each and every such potential problem.


Note that these protections are only needed because Google supports every imaginable browser version even outdated ones. You most certainly do not need to do the same.

Array and object globals cannot be overridden now (since 2007) for literals [0] and for ambient authority problem with CORS just check the Origin header.

[0]: https://johnresig.com/blog/re-securing-json/


That would a good note to add to that StackOverflow question.


> You most certainly do not need to do the same.

... except that those browsers are still out there, so it depends heavily on how much damage someone can do by abusing the data your server can emit whether you need to do the same.


If you are browsing the web with a 10 year old browser you are opening yourself up to a ton of security bugs. Whether json responses contain a while loop or not isn't going to make a difference.

The reason Google and Facebook keep this kind of stuff around is because it's there and doesn't hurt to keep it. There's a slight chance it will provide some protection if a similar attack vector is discovered.


But aren't you saying this is an already existing attack vector then?? Why try to find a similar one if you knew you could just get an older browser version and use this one? Is that not a good enough reason to be prepared for it?


As sagethesagesage said [1], you're protecting the user from having their browser pass the user's data from your site to a malicious site. The attacker shouldn't be able to make the user run an old, vulnerable browser.

[1] https://news.ycombinator.com/item?id=14282532


It's more that people who use those browsers are being protected. It's not that those browsers can poke security holes in the site, they're just vulnerable to losing their own data.


Yup! In my personal (and basically worthless) opinion, this is why the entire "web application" ecosystem is a giant, flawed mess. It's basically what happens when a system originally designed to represent and transfer rich textual documents (HTML/HTTP) is bastardized into a application architecture.

Yes, I'm being somewhat hyperbolic. Bring on the downvotes! ;-)


This kind of criticism misses the point. The web is not designed. It is evolved. Various bits of it were designed at their outset, but it was literally impossible to envision all the implications of those design decisions.

This is not a bad thing, for the simple reason that every long-lived complex system involving many humans must behave this way.

Any attempt to top-down design the perfect, universal, distributed application runtime hits fundamental social problems not unlike those in a centrally planned economy: too much information to integrate, too many stubbornly uncooperative humans with their own divergent goals and opinions.

Systems at this scale are much more like biology than like circuit design.


Also JavaScript itself was designed to evolve in a backwards-compatible way. The way developers use JavaScript today is quite different from how they used it 10 years ago.

The idea that systems are fixed entities that have to be designed correctly up-front is wrong and is one of the reasons why the Waterfall model of software development has been superseded by Agile.

Good systems have to be designed to handle change. Change is the only constant thing in this world.


Evolution at-least has mass extinction events. Lets hope that Web 2.0, which resembles a gigantic, evolved Kraken filled with various protuberances analogous to the large intestine appendix, suffers from one sometime in the future.


It's a moot point. Very few professional web application developers would disagree with you. The problem is this is the world we live in and if you don't develop web applications in the consumer space you'll get eaten alive by your competitors who will.


It's worse than that. Because better solutions were "hard" or long-term and competing organizations couldn't agree on shared standards, they took an application and protocol designed to traverse documents, and built on complex hacks until it essentially became a pseudo-operating system, which now not only drives part of the global economy, but also changed the type and quality of information that most people receive.


So you're basically saying that the current web is a reflection of human kind, with all its flaws and quirks? :)


I like to think web browsers take the worse is better approach to security.

Security takes a back seat to reproductive fitness of the web as a platform. JS made the web insecure, but it also made it the world's premier application platform.

I blogged about this: http://kylebebak.github.io/post/browser-security-worse-is-be...


This problem isn't limited to web applications. Think about how many security problems happen on the server.

All sufficiently complex ecosystems are a giant, flawed mess.


Modern web development is already hard by itself, specially when it comes to security. A saner runtime language is needed to replace the sub par standard that is javascript. One with a robust type-system and coherent semantics. It won't fix every problem, but a least it would prevent abuses such as the one in question.


WASM (WebAssembly) is about developing a very simple cross-browser bytecode that allows implementing any runtime on top of it. The first versions are already rolling out in latest major browser versions, but at this stage you don't yet get DOM access from WASM. After the initial phase when DOM access is implemented, it's the beginning of end for JavaScript. Future browsers might well implement JS as a pre-shipped runtime targeting the internal WASM core.


Web Assembly is specifically designed not to replace JavaScript [0].

[0]: http://webassembly.org/docs/faq/#is-webassembly-trying-to-re...


I was commenting to the GP about technologies to replace JavaScript. On the long term WASM is the best candidate, though it's indeed not one of the intended goals of the project. JS will be with us eternally, rest assured. But if DOM-enabled WASM would one day gain wide adoption, developers targeting contemporary browsers of the future would at least have a wider selection of runtimes to choose form in addition to JS.


On the other hand, if you thought modern browsers are bloated, just wait for everyone to compile their runtimes on top of WASM.

It's not very hard to imagine, especially in an enteprise environment, running a browser 15-20 years from now and that browser loading the equivalent of the JVM, .NET CLR, Ruby VM, etc., on top of WASM :)


15-20 years from now, it's likely that "browser" will just be the operating system.


This actually reminds me of "es-operating-system"; an experimental operating system copyrighted by Nintendo (yes, Nintendo!), where "every system API is defined in Web IDL".

AFAIK it never went anywhere, but maybe building an entirely new OS/Browser based around WebIDL seemed less insane 10 years ago.

https://code.google.com/archive/p/es-operating-system/


We'll all have gigabit connections by then. So even though it'll be 100 mb of bloat, it will still load the same as today ;)


Yes but they'll be pwned by the FCC (and friends). Don't count on it.


I can also see it happen that browsers will one day shut down plain JavaScript, only allowing WASM. Certainly if the security burden becomes too big.


That's awfully optimistic of you.

First of all browsers are committed to backwards compatibility.

Secondly, there's huge amounts of Javascript written right now, nobody's going to throw away billions of dollars worth of investments. People complain about Cobol written in the 60's, when the programmers counted in the thousands. Javacript today is written by millions of programmers.

And thirdly, Javascript evolves, as do browsers.


I can certainly see a world where WASM and JS execution pipelines in browsers converge -- where the form used for executing WASM and JS is the same.


Once WASM becomes established, I would expect JS to become "yet another runtime" on it.


That's what they have to tell to placate JS apologists.


You're basically letting strangers run code on your computer. That's basically what a "website" is. It is truly impressive to me how we can have something so complex and still manage to somehow keep it (usually) secure.


That's what "software" is, dude.


That's true, but the scale is different by orders of magnitude, and people have grown far more trusting of random websites than random software.


Can I hire you to make quips like this during meetings when people say the most obvious shit?


Indeed, there are so many traps you can fall into when writing web apps. It feels like when web was designed, security was not given due attention and efforts.


FB prepends a "for(;;);" which is 1 char shorter than "while(1);", has been the case since 2012/13.

Firebug v2 and ChromeTools know how to parse such JSON and ignore that first part. (IE11 and Firefox newer DevTools can't "handle" it aka show just a plain text string)


'for(;;);' probably compresses better than 'while(1)' too. Semicolons are (very :) common in JavaScript code and the for idiom repeats them three times.


Why does it have to be a loop, couldn't you make a reliable syntax error in less than 8 characters?


The risk there is some parsers might carry on past the syntax error and try to continue parsing. This is JavaScript after all.


No, that’s not a real risk.


I’m not sure why this is downvoted. No JavaScript engine does that. “This is JavaScript after all” is ridiculous FUD.


I was sure I had used browsers which did that, if I didn't then sorry, I must be hallucinating.

I wouldn't call it FUD, I'm not suggesting don't use JavaScript, and we are already talking in this article about one crazy workaround because of the weirdness of modern jazz development!

The "this is JavaScript after all" referred to JavaScript tending to continue after errors (which it does in some cases, like a bad callback, or a whole file which didn't parse).


> one crazy workaround because of the weirdness of modern jazz development

Autocorrect of “JS”? If so: it’s not a modern weirdness; this is an old, long-fixed browser bug.

FWIW, JavaScript continues executing after errors in cases where it makes sense. If an event listener throws an error, it doesn’t make much sense for it to stop all future events without crashing the page (which is kind of what IE used to do with its script error message box, and we know how that turned out).


The browser may disclose part of the JSON content in a "parse error" error message. A window.onerror handler could catch this message.

I believe that some browsers used to do that some times ago.


The offending website could have error handling to catch and discard the syntax error (e.g. global uncaught exception handler). They wont be able to read the JSON, but otherwise they'd be OK.

But getting hit with this, they would be actively hurt, and I don't think that interpreter session would be able to recover.


You can, and Google does. For example:

https://fonts.google.com/metadata/fonts

And here's the code in Angular's $http service for stripping out that string:

https://github.com/angular/angular.js/blob/master/src/ng/htt...


8 chars is already pretty short. If you're concerned about the length, don't be. A TCP packet is at least 512 bytes.


This is definitely the sort of thing you'd want to gather data on. It's plausible that it could save Terabytes of bandwidth per day at Facebook/Google scale.


That's like saying a car wash can save 3 drops of water at the end of the day.


An extra character will cause 1/512 of responses to take an extra packet, so the amortized cost is still one character per response. Presumably this matters at scale.


Not if your average response is less than 512.

If your responses are all between 505 and 512 in length then it might matter but most likely you are prematurely optimizing.


Why not "while(0)"? Then an eval wouldn't do anything.


Because it's not about eval(), as the link you're commenting on explains in detail?


If I'm understanding it correctly, though, prepending while(0) or even if(0) to the JSON would prevent the attack, because the JSON object would not actually be executed. I think they were asking if there was any particular reason to prefer the infinite loop over that.

The answer that comes to mind for me is that having the script hang is a more obvious failure state than simply skipping over the statement, and makes it more immediate that something has gone wrong.


Jeez, why not live w/o JavaScript?

We keep trying to accomodate a defunct language with insoluble problems. Isn't that an error in our thinking processes?

https://www.wired.com/2015/11/i-turned-off-javascript-for-a-...


Not only that. They keep piling on new shit APIs. Half of the WebGL exploits are probably yet unveiled, there's a side channel attack using the new ambient light sensor API, but hey, my phone can get darker based on ambient light, hooray!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: