Hacker News new | past | comments | ask | show | jobs | submit login
A method to use Google for DDoS. Bug or Not? (chr13.com)
104 points by chr13 on March 10, 2014 | hide | past | favorite | 50 comments



Nice catch. I'm not so sure about:

  A simple fix will be just crawling the links without the request parameters so that we don’t have to suffer.
Many links would fail/have different content if the request parameters were removed from the URL. Perhaps the crawler could use some kind of reverse bloom filter [1] to be more careful/back off if it receives the same content from multiple URLs. However nothing is simple at Google scale so there are probably issues with this approach too.

[1]: http://www.somethingsimilar.com/2012/05/21/the-opposite-of-a...


You can always change that to

    =image("http://targetname/1.jpg")   
    =image("http://targetname/2.jpg")   
    =image("http://targetname/3.jpg")


But what if 2.jpg doesn't exist? Or is a trivially small file?

The advantage of the querystring-method is that you can just find one suitable (i.e. huge) file and force Google to pull it down many times.


I'm not surprised at Google's response, since this looks to me along the same lines as putting lots of images in your signature in a popular forum; although in that case it is really is a DDoS.

Maybe Google should consider putting a bandwidth limiter of some sort on that (or even better: use hashes to avoid duplicates), but I think screaming "security! vulnerability!" is not a good action to take here...


How could Google use hashes to avoid duplication? They'd have to download each link before they could hash the contents thereof, so the damage would still be done.


The damage could be 3 downloads per Google Document. If 3 downloads produce 3 similar hashes then start limiter/throw up capchta/delay to avoid heavy intra-document duplication.


How could Google use hashes to avoid duplication?

Rate limit per website (e.g. don't download more than 10 images per domain per second)

Limit the total number of images it downloads per document, so a single user can not cause too much traffic.


In that case, users may notice a performance decrease in spreadsheets for images from certain websites.


http://en.wikipedia.org/wiki/HTTP_ETag

(I know that servers can be configured not to send ETags or break caches by sending random ones every time, but this could reduce the data usage considerably since most of the responses would only include the headers.)


The query parameters make each request different. Etags are not unique across the internet - just for a specific url. There is no way an etag would help here, unless the same request is made later. Even making a request with an Etag still means lots of headers returned which while not 10MB will add up to lots of traffic.


But they could hash the filename (a hash prevents accidental disclosure of content).


Hashing the filename doesn't help, the URL is different, which is why caching doesn't work.

If we ignore that ETags are related to URLs and not 'files', ETag as suggested by userbinator might work for some cases, but if the large file is dynamically generated, it's unlikely to have an ETag; defaults in many servers are to make an ETag based on the inode of the file rather than any properties of the file, so if there are multiple servers behind a load balancer, they're likely to return different ETags.


I've seen this bug floated around a few times, with the request parameters and all. Interestingly enough, you do not have to use an image either, and can link to any document on the server. In addition, it will work with nondeterministic values. So you can do (for example):

    =image(CONCATINATE("http://example.com/?", RAND()))
If you add this to a spreadsheet and fill a few thousand rows with it. Each time the spreadsheet is loaded, google will hit the server a few thousand times.


The other huge problem here is that Google's FeedFetcher doesn't respect robots.txt. (Their reasoning is that it is acting at the direct request of a human to retrieve a specific resource, so it doesn't count as a bot.) Because of this, there is no easy way to stop it from hitting your site.


You can block the user agent, I believe "Feedfetcher-google" should work.


True, but (while possible) it's not straightforward to block access to specific files only. The same user agent is also used for Google Custom Search if you're using that. And it's still going to be hammering your firewall (although admittedly that's less catastrophic than trying to download a 10MB file repeatedly).


I've decided to stress test this idea. This is what google errors on:

"It's taking a while to calculate formulas. More results may appear shortly."

I set the spreadsheet document to load images like so: =image("http://example.com/image?id=146&r={increment here}")

After 30 or so images, google starts to slow down its fetch rate.


Multiple spreadsheets ?


30 per spreadsheet it would seem. It seems to be that if the document takes longer then 2 seconds to load, it starts limiting itself.


I wonder how much visibility on HN it requires for the switch from "not a bug" to "definitely, for sure a bug" to happen?


> Since Google uses multiple IP addresses to crawl it’s also difficult to block these type of GET flood

It wouldn't be too hard to block by User-Agent: Mozilla/5.0 (compatible) Feedfetcher-Google; (+http://www.google.com/feedfetcher.html); if you notice the traffic.

Feedfetcher does not fetch robots.txt though; so you'd have to do something in your server config.

[edit: fixed a typo, and agree with the update]


Thanks, I've updated the post.


This is about two years old: http://www.behind-the-enemy-lines.com/2012/04/google-attack-...

I would hope that Google is able to detect abuse of their infrastructure for (D)DOS.


Indeed, I've quoted that article. But it doesn't talk about random parameters which makes it so easy to attack any website not just your own where you know what the urls are.


Nothing mind blowing, same vulnerability really, there are many ways to extend the core issue.


Nice catch!

I don't think removing the parameters would be ideal, though, since some sites might legitimately serve up different images based on different parameters.

Just limiting the amount of traffic to a single server, or outbound from a single spreadsheet, seems like a good solution, though.


Yes of course. But then do those dynamic images serve any purpose on a spreadsheet ? If a user needs a dynamic image he can download it to his own machine and upload it. Of course if he need many dynamic images, then that's another question.


> If a user needs a dynamic image he can download it to his own machine and upload it.

Doesn't that somewhat defeat the purpose of a dynamic image?


The reason I use Google docs is to use Google to fetch and do the rendering for me. I don't include images in my spreadsheet but I certainly think some people do. It's a feature.

Moreover, the issue is not about the feature, it's whether Google should limit the number of requests made per image. From other comments it seems like Google is hitting each image hundreds or even thousands of times. I suspect this is for cache? If that's the case, Google should look at better way to handle it. A single fetch and propagate to closest zone should be enough. But this is not a reason to limit the feature (eliminating parameter query).


I've seen people use Excel with millimetre-scaled columns for creating bills and other documents instead of using Word, no joke.

You see a surprisingly high amount of excel-based bill templates - and you may want to hotlink the company logo or a signature.


I'd say if Google can send this much traffic automated it is most certainly a bug. They should engineer some sort of upper limit to the amount of traffic sent to a single ip so as not to perform denial of service attacks on it.


Doesn't Facebook do something similar for preview links in chat and/or wall posts? You're probably limited by the number of messages/posts, but I wonder if that could be exploited with n number of fb accounts.


Yes, the same to Twitter. When you paste any links into your tweets, a Tweet Bot will crawl the link to show preview information.


There are reports skype does it to.


Don't want to disclose anything at this point but at least one other huge bandwidth owner suffers from this type of attack. Combined, it is clearly a disaster for any small-medium business.


A better solution might be to throttle multiple requests to the same domain. Also, they could prevent it from fetching the same content multiple times and use a cache instead based on ETAGs and file information.


Fetching the content without request parameters is not a fix, as the author claims. This depends entirely on the server and the way the content is stored, but request parameters can be used to determine what sort of content is fetched, and removing those parameters can form an invalid request.


I guess a way to protect from this would be to set some forwarding at the web server level, so that any url for static files would forward to its canonic representation. After the forward I think the file shouldn't be downloaded again.


Many of the questions here are already answered officially by Google: https://support.google.com/webmasters/answer/178852


May be they should visibly put that FeedFetcher can crawl your website can incur 1TB bandwidth in couple of hours ? May be apache and other httpd should block these crawlers by default ? How else would anyone know about a certain Google feature that could be disaster for them ?


DaaS - DDOS as a service :)


DDOS as a Service has been around for years, the difference is that Google is trying to muscle in and undercut the competition with a free service. Unlike the extant players their customer service is nonexistent, so I'm sure the big DaaS companies won't lose too much business...


Imagine the DDOS providers filing an antitrust case against google :P


Even kiddies like me can DDoS now... Where shall I begin... ;)


That's pretty clever, good find! I wonder if you can do the same stuff in Office 365, etc.?


Any googlers wanna comment on this, especially if you're from the docs division?


My hat tip to you. Nice find!


Reminds me of the kind of article I used to read in 2600. Nice post!


please post a how to tutorial. like where do i post the image list?


Fascinating!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: