*Unfortunately (and here's the human error), the URL of '/' was mistakenly check...

mechanical_fish · on Jan 31, 2009

Incidentally, I'm glad to know that I can now tell my clients that I have a Google-class QA procedure: I make my changes directly on the live site, without testing them, and I promise not to blow up the home page for more than an hour!

Six sigma, baby!

Seriously, though: We've all made similar mistakes before (though, after the first few times, we usually try to catch them on the dev server or the staging server). The trick for Google will be to ensure that this one is not made again.

cgranade · on Feb 1, 2009

It sounds to me as if they did, and as if the badware blacklist implementation was working fine, but was given bad data. Given the rapid proliferation of new badware, it seems reasonable to bulkload blacklist updates to an existing and tested backend. Obviously, given the scant details, I could be wrong, but my impression is that this is a wonderful demonstration of GIGO.

kwamenum86 · on Jan 31, 2009

I'm not sure it's practical to test this on a dev server with each update. All they really need to do is make sure "/" or sites ending with "google.com" and maybe some other white listed sites are never included in the file.

mechanical_fish · on Jan 31, 2009

I'm not sure it's practical to test this on a dev server with each update.

I tell myself that all the time. And then I pause, and sigh, and load up the dev site anyway, just to make sure the home page has not exploded. Because sometimes it has. Typos happen. Brain farts happen.

Obviously I wouldn't necessarily expect Google to test a dev server by hand every time they change some code. That procedure is for little people like me. [1] For something as important as Google I'd expect there to be a procedure whereby each new change gets pushed to a small group of boxes, which then run a few really simple automated acceptance tests ("If I do a search for a random term, do I get back a page with links that can actually be followed?") before the thing gets pushed live fifteen minutes later.

Apparently I expect too much. Perhaps I'm overengineering this. Maybe it's okay if, a couple of times a decade, we just cripple half the Internet for 30 minutes and give 5% of the world's computer users a virus scare.

---

[1] At least until I get lazier (in the Larry Wall sense of the word) and write some more scripts.

LogicHoleFlaw · on Jan 31, 2009

I've had production failures on very critical systems when files were edited and silently had their \n line endings replaced with \r\n line endings. Once we found the problem it was simple enough to fix, and we changed the files in question to be robust against such changes in the future, but that was an... unpleasant day.

gojomo · on Feb 1, 2009

An embarrassing flub, sure, but perhaps not expensive. By scaring people away from natural results, Google's ad clickthroughs may have gone up during the glitch!

yoyoyo · on Jan 31, 2009

She wanted dinner and a movie.