Hacker News new | past | comments | ask | show | jobs | submit login
Subtle bug in Google can get you banned
61 points by jacquesm on Oct 3, 2010 | hide | past | favorite | 28 comments
I just had my IP banned from google and I couldn't understand why, after enabling the 'google toolbar'.

The toolbar is useful because it allows you to see the pagerank of your pages, after enabling it the browser will restart.

And that's where the problem will occur, if you have a large number of browser windows or tabs open when you do this (or if after a browser crash you do a recovery) google will interpret the flurry of toolbar requests when the browser comes back up as an attempt at automated requests to their servers and will block your IP accordingly.

Highly annoying! Effectively the use of one (luxury) google service disables the use of another one that is far more essential.

I hope there is a way out of 'toolbar induced google purgatory'.

update: I can use google again (after 15 minutes), but the toolbar still does not function.




There are a lot of people that scrape Google pretty badly, so we do need to have protection against bots, including ones that look like the Google toolbar. If you're resuming ~50 tabs, I can believe that might look like a scraper to us for while. I'm glad you could do regular Google searches after 15 minutes or so.


So, are you seriously telling me that google can't tell the difference between their own toolbar used by a logged in user and a bot?

How about changing the toolbar code so it paces the requests to something that sits below the frequency of the 'ban for bot use' trigger? That would seem to me to be an obvious fix.


Bots can probably perfectly duplicate the behavior of a toolbar. Only the rate and volume of requests would be different.

I'm assuming the toolbars can't communicate between each other. On toolbar launch, it should pick a random number between 1 and x and wait that many ms before contacting google. Pick x by looking at the number of req/sec that trigger a ban and the high-end number of tabs a power user might restart with. This would spread the requests out over that time period and keep it under the ban.


The toolbar should really only pull the page rank for active tabs. This would effectively only require a single request initially.


> Bots can probably perfectly duplicate the behavior of a toolbar. Only the rate and volume of requests would be different.

If they do, isn't that precisely what Google would want? Isn't it only the rate and volume of requests that are a problem?


There is plenty of malware that finds victims by looking at the results of google searches. Google seems to think that they have an obligation to prevent the indiscriminate spread of self-replicating infovores. Fucking Censorship if you ask me.


They might not want someone to build a large database of pageranks.


ok, wouldn't that require like a very very very long time? Google has a dadabase probably terabytes big and if someone does want it can't they do something like what DDG does? I believe they get their searches by yahoo for free


Of course they can communicate with each other, they're extensions, not web pages. The first one could act as the 'master', and proxy all the requests.

It's obvious the limiting is rate based, otherwise this would never have happened, so if it is rate based then the toolbar could pace itself to below that rate. Of course that would 'give away' the rate to observers of the toolbar during a browser restart but they could observe that just the same by checking when they get blocked, so that's no loss.

The toolbar knows I'm logged in, knows that a browser has just restarted and presumably can see how many instances/tabs are open (after all that's what it provides the info on) so it has all the data at it's disposal to make the right decision. This seems like a simple oversight to me (that a user installing the toolbar on a machine with a large number of tabs open would land in this situation).


Of course they can communicate with each other, they're extensions, not web pages.

Firefox extensions are Javascript CSS and XUL so I don't think that's obvious. I think it's entirely reasonable to assume that they might be sand boxed and have no awareness of each other. Is it one instance of the toolbar per "page-opened" event? Is it one instance per window? What I was describing was a way to stay under the limit without having centralized state-aware rate-limiting code. If that's possible, then yeah sure, do it that way.

It's obvious the limiting is rate based, otherwise this would never have happened, so if it is rate based then the toolbar could pace itself to below that rate.

It's not obvious to me. I think the issue is that the OP opens 50 tabs simultaneously after a crash and each window opens a connection to google without a rate limit of any kind. My idea was a way to do it without a centralized state.


Firefox extensions are Javascript CSS and XUL so I don't think that's obvious. I think it's entirely reasonable to assume that they might be sand boxed and have no awareness of each other.

Multiple Mozilla extension instances are indeed able to communicate via some centralised code.


It's possible to do with JavaScript modules: "JavaScript code modules are a concept introduced in Gecko 1.9 (Firefox 3) and can be used for sharing code between different privileged scopes." https://developer.mozilla.org/en/Using_JavaScript_code_modul...


> Of course that would 'give away' . . .

Meh, I have trouble believing that spammers cannot experiment to find this number out themselves. The binary search on rate would require only a handful of IPs before you acquire it to a sufficient resolution for working purposes.


Wouldn't it be simpler to rate limit the toolbar? E.g. it will only try x requests/second? Then bots wouldn't emulate it because it wouldn't be able to provide a high enough rate to be really useful.

Of course, that solution is so simple, I'm sure there's a reason it's not possible.


You're oversimplifying a real problem. Would you "authenticate" your own toolbar somehow? That's probably excess effort for a temporary 15 minute ban. Would you rate limit it? Well that's a lost cause if I ever saw one; bots can rate limit themselves too.

It's not an easy problem to solve.


Other factors can also come into play, e.g. you could be sitting on an IP subnet where someone else has been scraping Google, or a worm has been sending automated queries to Google.


> you could be sitting on an IP subnet where someone else has been scraping Google

Unlikely, I'm in the sticks, most people here are old and wouldn't know a mouse from a keyboard

> or a worm has been sending automated queries to Google.

That would have to be a linux based worm then. Unless that suspected worm is sitting on another IP of course.

Do you want me to try to make it reproducible ? I'd happily spend the time if it would help to make this problem go away. I understand how hard it is to differentiate between bots and regular users, but you should be able to pick up the difference between your own toolbar in normal use situations and a bot.

And if that's not the case then either the battle is 'lost' or it might be better to simply only let the toolbar query the google servers when explicitly asked to do so.


I'd report that to Google as a bug.

http://www.google.com/support/toolbar/bin/request.py?contact...

The official toolbar should not exceed request limits that were designed to prevent PageRank scraping by third-party software.


It's now over half an hour, the toolbar still doesn't function.

"We're sorry...

... but your computer or network may be sending automated queries. To protect our users, we can't process your request right now."

Right...

I'll try to file a bug with them, but my experience with google and support issues so far does not lead me to believe that anybody will actually read the report.

"I know this form is used to track new issues so I won't receive a response"

Does not give me great hope.


Reminds me of how I used to get banned from reading Google Groups all the time just for opening a bunch of threads in tabs from Google Reader. Now I know that I have to open a couple, read a couple, go back to Reader and repeat. Kind of a shame that people have to act differently just to not be banned as robots.


Yeah, I usually need help to do Google's captchas too. Amazingly hard to convince them you're human.


Generally when this type of thing happens, Google will reply with a captcha that, if you correctly solve it, will let you keep going for a while. I guess toolbar requests might be a little different than web requests.


No captcha to be seen, the IP ban is still in place, it's now 8 hours later.


Enabling instant search in Chrome did this for me. Pretty sad really.


OP, any chance you could say how many total tabs+windows you had open? That should be useful while this bug is open.

Thanks for the tip!


About 50 in all. And I'm on a 10 Mbit link, possibly on a slower link it would not have triggered. More than an hour has passed now, I think I'll give up for the day (3:50 am here anyway) and hope that by tomorrow things will have normalized.

What a silly situation to be in.

I could change my IP by calling my provider but it is also entered in a fairly large number of ACLs that will not be updated automatically.


same thing happened to me with auto pagerize extension while incrementally refining my search because none of the results i was getting back were meaningful. i moved to duck duck go and bing. if they don't want me to use them for search, then fine. plenty of alternatives. haven't missed google so far.


Somebody from http://geotool.flagfox.net/?search=82.128.1.251 hacked into my Gmail a/c




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: