A method to use Google for DDoS. Bug or Not?

dantiberian · on March 10, 2014

Nice catch. I'm not so sure about:

  A simple fix will be just crawling the links without the request parameters so that we don’t have to suffer.

Many links would fail/have different content if the request parameters were removed from the URL. Perhaps the crawler could use some kind of reverse bloom filter [1] to be more careful/back off if it receives the same content from multiple URLs. However nothing is simple at Google scale so there are probably issues with this approach too.

[1]: http://www.somethingsimilar.com/2012/05/21/the-opposite-of-a...

est · on March 10, 2014

You can always change that to

    =image("http://targetname/1.jpg")   
    =image("http://targetname/2.jpg")   
    =image("http://targetname/3.jpg")

robzyb · on March 10, 2014

But what if 2.jpg doesn't exist? Or is a trivially small file?

The advantage of the querystring-method is that you can just find one suitable (i.e. huge) file and force Google to pull it down many times.

userbinator · on March 10, 2014

I'm not surprised at Google's response, since this looks to me along the same lines as putting lots of images in your signature in a popular forum; although in that case it is really is a DDoS.

Maybe Google should consider putting a bandwidth limiter of some sort on that (or even better: use hashes to avoid duplicates), but I think screaming "security! vulnerability!" is not a good action to take here...

knome · on March 10, 2014

How could Google use hashes to avoid duplication? They'd have to download each link before they could hash the contents thereof, so the damage would still be done.

blauwbilgorgel · on March 10, 2014

The damage could be 3 downloads per Google Document. If 3 downloads produce 3 similar hashes then start limiter/throw up capchta/delay to avoid heavy intra-document duplication.

ma2rten · on March 10, 2014

How could Google use hashes to avoid duplication?

Rate limit per website (e.g. don't download more than 10 images per domain per second)

Limit the total number of images it downloads per document, so a single user can not cause too much traffic.

glass_of_water · on March 10, 2014

In that case, users may notice a performance decrease in spreadsheets for images from certain websites.

userbinator · on March 10, 2014

http://en.wikipedia.org/wiki/HTTP_ETag

(I know that servers can be configured not to send ETags or break caches by sending random ones every time, but this could reduce the data usage considerably since most of the responses would only include the headers.)

rogerbinns · on March 10, 2014

The query parameters make each request different. Etags are not unique across the internet - just for a specific url. There is no way an etag would help here, unless the same request is made later. Even making a request with an Etag still means lots of headers returned which while not 10MB will add up to lots of traffic.

mschuster91 · on March 10, 2014

But they could hash the filename (a hash prevents accidental disclosure of content).

toast0 · on March 10, 2014

Hashing the filename doesn't help, the URL is different, which is why caching doesn't work.

If we ignore that ETags are related to URLs and not 'files', ETag as suggested by userbinator might work for some cases, but if the large file is dynamically generated, it's unlikely to have an ETag; defaults in many servers are to make an ETag based on the inode of the file rather than any properties of the file, so if there are multiple servers behind a load balancer, they're likely to return different ETags.

jhgg · on March 10, 2014

I've seen this bug floated around a few times, with the request parameters and all. Interestingly enough, you do not have to use an image either, and can link to any document on the server. In addition, it will work with nondeterministic values. So you can do (for example):

    =image(CONCATINATE("http://example.com/?", RAND()))

If you add this to a spreadsheet and fill a few thousand rows with it. Each time the spreadsheet is loaded, google will hit the server a few thousand times.

tempestn · on March 10, 2014

The other huge problem here is that Google's FeedFetcher doesn't respect robots.txt. (Their reasoning is that it is acting at the direct request of a human to retrieve a specific resource, so it doesn't count as a bot.) Because of this, there is no easy way to stop it from hitting your site.

HNaTTY · on March 10, 2014

You can block the user agent, I believe "Feedfetcher-google" should work.

tempestn · on March 10, 2014

True, but (while possible) it's not straightforward to block access to specific files only. The same user agent is also used for Google Custom Search if you're using that. And it's still going to be hammering your firewall (although admittedly that's less catastrophic than trying to download a 10MB file repeatedly).

AnonymousRetard · on March 10, 2014

I've decided to stress test this idea. This is what google errors on:

"It's taking a while to calculate formulas. More results may appear shortly."

I set the spreadsheet document to load images like so: =image("http://example.com/image?id=146&r={increment here}")

After 30 or so images, google starts to slow down its fetch rate.

chr13 · on March 10, 2014

Multiple spreadsheets ?

AnonymousRetard · on March 10, 2014

30 per spreadsheet it would seem. It seems to be that if the document takes longer then 2 seconds to load, it starts limiting itself.

dpatrick86 · on March 10, 2014

I wonder how much visibility on HN it requires for the switch from "not a bug" to "definitely, for sure a bug" to happen?

toast0 · on March 10, 2014

> Since Google uses multiple IP addresses to crawl it’s also difficult to block these type of GET flood

It wouldn't be too hard to block by User-Agent: Mozilla/5.0 (compatible) Feedfetcher-Google; (+http://www.google.com/feedfetcher.html); if you notice the traffic.

Feedfetcher does not fetch robots.txt though; so you'd have to do something in your server config.

[edit: fixed a typo, and agree with the update]

chr13 · on March 10, 2014

Thanks, I've updated the post.

blauwbilgorgel · on March 10, 2014

This is about two years old: http://www.behind-the-enemy-lines.com/2012/04/google-attack-...

I would hope that Google is able to detect abuse of their infrastructure for (D)DOS.

chr13 · on March 10, 2014

Indeed, I've quoted that article. But it doesn't talk about random parameters which makes it so easy to attack any website not just your own where you know what the urls are.

nbody · on March 10, 2014

Nothing mind blowing, same vulnerability really, there are many ways to extend the core issue.

smelendez · on March 10, 2014

Nice catch!

I don't think removing the parameters would be ideal, though, since some sites might legitimately serve up different images based on different parameters.

Just limiting the amount of traffic to a single server, or outbound from a single spreadsheet, seems like a good solution, though.

chr13 · on March 10, 2014

Yes of course. But then do those dynamic images serve any purpose on a spreadsheet ? If a user needs a dynamic image he can download it to his own machine and upload it. Of course if he need many dynamic images, then that's another question.

userbinator · on March 10, 2014

> If a user needs a dynamic image he can download it to his own machine and upload it.

Doesn't that somewhat defeat the purpose of a dynamic image?

yeukhon · on March 10, 2014

The reason I use Google docs is to use Google to fetch and do the rendering for me. I don't include images in my spreadsheet but I certainly think some people do. It's a feature.

Moreover, the issue is not about the feature, it's whether Google should limit the number of requests made per image. From other comments it seems like Google is hitting each image hundreds or even thousands of times. I suspect this is for cache? If that's the case, Google should look at better way to handle it. A single fetch and propagate to closest zone should be enough. But this is not a reason to limit the feature (eliminating parameter query).

mschuster91 · on March 10, 2014

I've seen people use Excel with millimetre-scaled columns for creating bills and other documents instead of using Word, no joke.

You see a surprisingly high amount of excel-based bill templates - and you may want to hotlink the company logo or a signature.

Navarr · on March 10, 2014

I'd say if Google can send this much traffic automated it is most certainly a bug. They should engineer some sort of upper limit to the amount of traffic sent to a single ip so as not to perform denial of service attacks on it.

mickle00 · on March 10, 2014

Doesn't Facebook do something similar for preview links in chat and/or wall posts? You're probably limited by the number of messages/posts, but I wonder if that could be exploited with n number of fb accounts.

karmiphuc · on March 11, 2014

Yes, the same to Twitter. When you paste any links into your tweets, a Tweet Bot will crawl the link to show preview information.

Houshalter · on March 10, 2014

There are reports skype does it to.

chr13 · on March 10, 2014

Don't want to disclose anything at this point but at least one other huge bandwidth owner suffers from this type of attack. Combined, it is clearly a disaster for any small-medium business.

jwarkentin · on March 10, 2014

A better solution might be to throttle multiple requests to the same domain. Also, they could prevent it from fetching the same content multiple times and use a cache instead based on ETAGs and file information.

munimkazia · on March 10, 2014

Fetching the content without request parameters is not a fix, as the author claims. This depends entirely on the server and the way the content is stored, but request parameters can be used to determine what sort of content is fetched, and removing those parameters can form an invalid request.

amatheus · on March 10, 2014

I guess a way to protect from this would be to set some forwarding at the web server level, so that any url for static files would forward to its canonic representation. After the forward I think the file shouldn't be downloaded again.

tedchs · on March 10, 2014

Many of the questions here are already answered officially by Google: https://support.google.com/webmasters/answer/178852

chr13 · on March 10, 2014

May be they should visibly put that FeedFetcher can crawl your website can incur 1TB bandwidth in couple of hours ? May be apache and other httpd should block these crawlers by default ? How else would anyone know about a certain Google feature that could be disaster for them ?

jdappletini · on March 10, 2014

DaaS - DDOS as a service :)

nieve · on March 10, 2014

DDOS as a Service has been around for years, the difference is that Google is trying to muscle in and undercut the competition with a free service. Unlike the extant players their customer service is nonexistent, so I'm sure the big DaaS companies won't lose too much business...

notfoss · on March 11, 2014

Imagine the DDOS providers filing an antitrust case against google :P

keehun · on March 10, 2014

Even kiddies like me can DDoS now... Where shall I begin... ;)

NamTaf · on March 10, 2014

That's pretty clever, good find! I wonder if you can do the same stuff in Office 365, etc.?

vxNsr · on March 10, 2014

Any googlers wanna comment on this, especially if you're from the docs division?

mschuster91 · on March 10, 2014

My hat tip to you. Nice find!

gargarplex · on March 10, 2014

Reminds me of the kind of article I used to read in 2600. Nice post!

KCatterlin · on March 12, 2014

please post a how to tutorial. like where do i post the image list?

zacinbusiness · on March 10, 2014

Fascinating!