Hacker News new | past | comments | ask | show | jobs | submit login
Google on today’s massive Google+ spam influx: “We ran out of disk space” (venturebeat.com)
67 points by suneliot on July 10, 2011 | hide | past | favorite | 41 comments



Interesting. Given the minimum size of a Colossus cluster I find the explanation Vic gives unsatisfying. That being said I'm pretty impressed with the overall product, it is the best attempt yet to unseat Facebook, and I predict it will if Facebook can't come up with a credible response quickly.

The killer feature is that its blended with Gmail, and since a lot of people keep gmail open all the time it means you get notified and you see stuff. Not as common to keep one's facebook page open.


since a lot of people keep gmail open ... Not as common to keep one's facebook page open.

Do you have data on this? I'd think it would be close, if not the opposite.

Edit for clarification:

In my social circles, what you said is probably true, but I know many others use FB messages more than email (any email, not just gmail). Facebook has more pageviews total, and Facebook has probably around 3X as many active users (I couldn't find numbers at the same point in time, Gmail was at around 200 million last November as per the WSJ, Facebook was at 500 million last July and 750 million last week)

Don't get me wrong, I agree with you that G+ notifications in Gmail (and on Google Search!) is hugely powerful, and will mean G+ engagement among its users will stay quite high. But when you talk about "unseating" Facebook, you have to first come to grips with just how entrenched it is, compared to social networks that rose and fell before it. It is so entrenched that Gmail integration alone will not be enough - Facebook is substantially bigger than Gmail.


You are correct in that I was generalizing when I should not have been.

I've used Google Apps for work and since a lot of communication comes through email that means Gmail is open (or at least getting notifications with the talk gadget). I would not be surprised if you were correct that many folks leave Facebook open all the time.


it's worse than that: among my non technical friends email is used only for "serious stuff" as talking with teachers (or students, for my friends who teach) and for work. For the others facebook messages have supplanted email.

I am obviously not sure this is a global trend, but I keep using the test "who did you receive an email from, recently, who isn't work or a fellow technical person?". Results are somewhat scary.


Yeah, email is now the snail-mail of the mid-2000s.


Interesting, this means the spammers will have to change venues to reach that part of their demographic. That should be giving them something to think about.


Is this a joke? When I see people using their phones in public the #1 app, bar none, that I see open is Facebook. people are just browsing their feeds, commenting, liking, etc. This is mostly on iOS devices but increasingly on Android as the FB app has matured.

750 million people around the world use Facebook at least once a month. Of that number HALF return every single day, and 80% use it at least once a week. Facebook is a big deal that has deeply imbedded itself into the lives of many, many people. "Checking your Gmail" is a completely different kind of interaction.


Tangental but - the Facebook app on Android has done anything but mature. It's been stagnant for the last 6-8 months while the web interface has come on leaps and bounds.

At this point in time, I see no reason other than persistance of data (FB messages) to use the app.


How do you know "the minimum size of a Colossus cluster"? Are you a Google employee leaking information?


Former employee, no.


I guess dropping tidbits of inside information for HN karma will never go out of style. NDAs apply after you leave, be careful...


Not so long ago hard disk space was abundant, then programmers realized hard disk space was abundant.


Shouldn't there be a Beta tag on the Google Plus icon. Perhaps they left it on Gmail for so long they didn't think anybody would pay attention to it anymore.


Some quick napkin math on numbers:

If you're to believe Eric Schmidt when he says 'millions' and put that at 3 million users (being generous), guessing that each user uploads 20 megabytes of content (again, generous) thats:

3x10^6 × 20 MB

6x10^7 MB or

60TB

Sanity check or does that not seem like a lot of resources to allocate to a project of this size?

Edit: formatting


I don't think this has anything to do with how much content a user uploads. They specifically said that “For about 80 minutes we ran out of disk space on the service that keeps track of notifications.”

So probably they had allocated some amount that made perfect sense while testing, but was too low for the full rollout that seems to (effectively) be happening now.


Quite simply, no. But if it was notifications they were probably going through 'Chubby' [1] which is sometimes used as a scratchpad for notifications. But if you read the interview with Sean Quinlan [2] you can get a sense that 60TB is quite small.

[1] http://labs.google.com/papers/chubby.html

[2] http://queue.acm.org/detail.cfm?id=1594206


Maybe some heavy logging in this field test phase could explain it? Indeed 60TB sounds like very little. I thought Google would use their GFS system to back this kind of things so the disk space wouldn't be an issue. What do I know.


60TB? That's about a couple grand worth of space. Why would that be a lot of space to Google?


Even using 3TB consumer grade SATA disks you'd need 20 of them, an enclosure for 20 disks costs a lot more than a couple of grand.



That's a chassis for 2.5" disks so you'd be looking at 60x1TB disks, and that would mean 3 of those enclosures. Now rack them somewhere and add power - still much more than a couple of grand, we haven't even paid for the disks yet...


Yeah, I was a tad hyperbolic in just referring to the disks. I would expect the costs to be around $20/GB/year when you also factor in power - bigger drives are making a difference, but the other factors always cost more than the disks themselves.

It doesn't change the fact that 60TB is tiny for a company whose every product involves storing enormous quantities of data and serving them at monstrous scale.


And according to the GFS paper their are three copies of every chunk in a GFS cluster so that is 180TB, and they probably don't depend on one GFS cluster to meet their availability guidelines so if you had two that is really 320TB (180TB * 2).

And the amazing part is if you are in an open event where Google is talking about their infrastructure in general terms you will realize that that has to be mouse nuts compared to the amount of 'spinning rust' they have going on at any one time.


Absolutely agree!


Sorry, posted wrong chassis. Look for the SC847, that's 45x 3,5" in 4U. With 32x 2T S-ATA you're looking at roughly $5000 (incl. disks).

And Google likely gets them quite a bit cheaper than that.


Eh, I was definitely lowballing a lot with "a couple grand." You also need to factor in power and replication.


Not sure what you mean. It's still a couple grand when you factor in power and replication. ;-)

For google it's a rounding error either way. They measure in Petabytes, not Terabytes.


I have on the order of six terabytes knocking around in my house. I think the chocolate factory can store a few orders of magnitude more than me before it starts to sweat.


So I guess "field testing" is the new beta testing? Google watered down beta by applying it to finished products to the degree that they've had to invent a new term to fill its function.


I still have this happen, and every time I tell myself right, today I'm going to put in a warning system that'll stop this happening .... and then I clear up a ton of space and move on.

HTTP.sys logging has been a real pain in my ass, every time I deploy a new server I forget to disable it and we do so much traffic it fills the drive completely overnight.


Sounds like you need to automate setting up a new server


Or at least automate the logging process. 5-minute log monitor that compresses old logs, and maybe moves them off the system onto a cloud server once space starts to run out, while also e-mailing the people responsible.


Yep that would help, notifications are built into about 1/2 my platform now ... gradually getting them more robust. :)


That also doesn't quite happen enough to make it a priority!


Running out of disk space is probably the single biggest category of server problem that occurs.


Seems today and yesterday (for me and those I have invited anyway) there has been no issue with creating an account straight away. So I'm guessing this has sent the total users on the service up pretty fast.


running out of diskspace is devops 101, it is high time these colossus startups start publishing which CMM level standard they meet


Are you joking?


no, I am pretty serious, down votes dont change reality. there are apps/services which are signing on 99.999% availability.

if after having a expertise in managing 200million+ gmail accounts, I would consider it a bad fumble at 5 million+ users


Mentioning devops undermines your credibility. What service is 5 nines? How is that measured?

In general, Google SRE just gets it done. Sometimes people screw up. It happens.


Oh, I was referring to you asking about so-called "CMM level" when we're talking about Google, not some second-hand suit-run IT company.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: