Hacker News new | past | comments | ask | show | jobs | submit login
Official Google Blog: "This site may harm your computer" on every search result?? (googleblog.blogspot.com)
28 points by Anon84 on Jan 31, 2009 | hide | past | favorite | 18 comments



Unfortunately (and here's the human error), the URL of '/' was mistakenly checked in as a value to the file and '/' expands to all URLs.

Wow, one character. That was one expensive character.


Incidentally, I'm glad to know that I can now tell my clients that I have a Google-class QA procedure: I make my changes directly on the live site, without testing them, and I promise not to blow up the home page for more than an hour!

Six sigma, baby!

Seriously, though: We've all made similar mistakes before (though, after the first few times, we usually try to catch them on the dev server or the staging server). The trick for Google will be to ensure that this one is not made again.


It sounds to me as if they did, and as if the badware blacklist implementation was working fine, but was given bad data. Given the rapid proliferation of new badware, it seems reasonable to bulkload blacklist updates to an existing and tested backend. Obviously, given the scant details, I could be wrong, but my impression is that this is a wonderful demonstration of GIGO.


I'm not sure it's practical to test this on a dev server with each update. All they really need to do is make sure "/" or sites ending with "google.com" and maybe some other white listed sites are never included in the file.


I'm not sure it's practical to test this on a dev server with each update.

I tell myself that all the time. And then I pause, and sigh, and load up the dev site anyway, just to make sure the home page has not exploded. Because sometimes it has. Typos happen. Brain farts happen.

Obviously I wouldn't necessarily expect Google to test a dev server by hand every time they change some code. That procedure is for little people like me. [1] For something as important as Google I'd expect there to be a procedure whereby each new change gets pushed to a small group of boxes, which then run a few really simple automated acceptance tests ("If I do a search for a random term, do I get back a page with links that can actually be followed?") before the thing gets pushed live fifteen minutes later.

Apparently I expect too much. Perhaps I'm overengineering this. Maybe it's okay if, a couple of times a decade, we just cripple half the Internet for 30 minutes and give 5% of the world's computer users a virus scare.

---

[1] At least until I get lazier (in the Larry Wall sense of the word) and write some more scripts.


I've had production failures on very critical systems when files were edited and silently had their \n line endings replaced with \r\n line endings. Once we found the problem it was simple enough to fix, and we changed the files in question to be robust against such changes in the future, but that was an... unpleasant day.


An embarrassing flub, sure, but perhaps not expensive. By scaring people away from natural results, Google's ad clickthroughs may have gone up during the glitch!


She wanted dinner and a movie.


This could be a PR nightmare for Google. On top of the Epic Google Fail, they could not even come up with an accurate blog post: http://blog.stopbadware.org/2009/01/31/google-glitch-causes-....

It will be interesting to see the stories that come up when the regular news cycle begins again.


Interestingly, the StopBadware blog is now down - I can't get a response from it. I guess they got GoogleDotted.


Well StopBadware got screwed twice then. First, their site was overloaded because every Google user who attempted to view a search result url was taken to the interstitial page, which links to their site...bad news. Their site is still down as of this comment: http://stopbadware.org

Now I suspect their blog is down because Google linked to their blog.

The blog post basically pointed out some inaccuracies in the original version of the Google blog post.


This is a risk you run when you have continuous deployments. Luckily, incremental roll out, sufficient unit testing, and other safeguards mitigate the problem, but sometimes things are still going to slip through. I'd bet that significantly man more problems have occurred, but this is simply the first big, obvious one worth talking about.

This goes back to what I was preaching earlier about Microsoft and Google's varied approaches to testing. If this sort of bug got burned on to a disk and shipped to a million customers, it wouldn't have been a 40-minute problem... Sadly, Microsoft's testing culture bleeds into its ailing web culture and is becoming less effective in the modern auto-update world. See: Google grows impatient, so they patch Chrome with a workaround for Hotmail. Meanwhile, the Hotmail team is waiting for the next scheduled release. Lucky for Google, Chrome is mostly useless without an active internet connection useful for pushing updates.


Someone should make the case for a separate test / dev environment to the big G :)

Seriously though, I wonder how many programmers / site builders are going to use this as an excuse for their own goofs....

"Look, if google can be down for half an hour due to one slash, you really shouldn't be angry with me for [insert some terrible mistake here]."


Isn't StopBadware Niels Provos' Google 20% project?


I wonder if that cost him his job.


I sure hope Google is more forgiving than that... in the grand scheme of things this is a very minor screw-up.


As a former Google employee, I can assure you that there have been far bigger screw ups than that... and it's just part of the learning experience. The bigger the screw up, the more money you've invested in training them :) (i.e. When I was there, one man single handedly instantly brought down 10,000 machines in a hard reboot... fortunately because of the way Google replicates and distributes things, the world was none the wiser. It really gave me great respect and trust in their infrastructure and no he wasn't fired.)


Wow. Someone broke the Google by slashing on it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: