Ma.gnolia Data is Gone For Good

look_lookatme · on Feb 19, 2009

"It turns out that Ma.gnolia was pretty much a one-man operation, running on two Mac OS X servers and four Mac minis."

This should be a lesson: entrust your systems administration to people that have experience in this stuff. I'm not saying you have to find an expert and pay him or her expert prices, instead, you probably have a developer or admin friend who knows what it takes to implement a basic backup strategy that will at the least keep you from losing all of your data.

I watched a little bit of the podcast and he said he was just syncing the db files... well if it's innodb, that's not going to work. It says so in the MySQL docs. You know, in the section about backups.

The lesson here could be this: if you have a great idea, get it developed and out and the door -- awesome. Now talk to someone with experience in systems planning. Don't just throw a bunch of mac minis in a cabinet, bust out an rsync script and call it a day.

KirinDave · on Feb 19, 2009

I don't want to throw rocks, because I used to work for Larry (in fact, I wrote the original codebase for Ma.gnolia.com) and I like him.

But I distinctly remember talking about backups and how a raw sync of the innodb files wasn't really what we needed (it could sort of work in the original deploy of ma.gnolia, but would not scale as the site scaled). I'm kind of surprised that the community has been so tolerant of this failure. Personally, I'd be furious if I found out that a site I trusted lost all my data because they hadn't even tested their backup system.

fendale · on Feb 19, 2009

> I'd be furious if I found out that a site I trusted lost all my data because they hadn't even tested their backup system

Exactly - first rule of backups - test them. For a database, it can be as simple as using your nightly backup to create your dev or test database or whatever. I guess it isn't feasible with a half terrabyte database every day, but to not have at least a several week old backup set is somewhat unforgivable.

On the other hand, I bet he will never ever make the same mistake again, and it will make a lot of people on here think long and hard about their own backup strategies ...

whatusername · on Feb 20, 2009

I've seen that kinda setup done nicely..

Daily automated DB Backup. Daily automated DB restore to a seperate server. (Which also can kinda double as a hot spare as required.)

look_lookatme · on Feb 19, 2009

You may not want to get into this, but why was the database half a terabyte anyway? That seems like a shitload of data for a site like ma.gnolia.

My best guess is the link screenshots were being stored in the DB.

woadwarrior01 · on Feb 20, 2009

Or could it be that half a gig was just a figure he came up with to demonstrate the (ostensible) difficulty in doing a proper backup ?

look_lookatme · on Feb 20, 2009

Well, half a gig is trivial to backup. Half a terabyte is not. Assuming you meant to say 500 gigs, then yes, there is difficulty, but it's not uncharted territory. He had two xserves -- perfect for replication.

gaius · on Feb 20, 2009

It's not even close to uncharted territory. You can walk into any Apple store and buy a 1-terabyte backup device off the shelf.

OK so it won't handle your databases properly but that's not the point; 500G is not a troublesome amount of data these days.

look_lookatme · on Feb 20, 2009

No the point is backing up a 500G database is difficult, but not unpossible. Sorry if that wasn't clear.

jacoblyles · on Feb 19, 2009

Is this a good case for cloud computing services?

timf · on Feb 19, 2009

I think it doesn't matter what your infrastructure looks like, the principles are the same. You should not rely on a single service to store your data and/or do backups properly for you.

For example, I am building an application on top of EC2 (IaaS) and utilizing the nice EBS system with snapshots to S3.

This is still a do-it-yourself style in many ways, you still need to understand the database system, read those docs and you still need to test restores etc. It's not all that different from the Ma.gnolia situation. The main difference is that you get some nice properties from EBS and S3 redundancy.

But even with those nice properties, like hell am I going to hang the crux of my business on S3 snapshots only. Besides technical failures (there have been S3 data inconsistencies reported), what if they cancel our account? Etc. etc.

Another thing people label "cloud computing" is something like Google's appengine (PaaS). Here, you trust Google entirely to store the data safely in the "datastore" but its only integrity that they try to supply you, not backups. I have never heard of a way to get Google to take snapshots (which are important when you or an attacker delete something that shouldn't have been, etc.).

You have to manually back up off of AppEngine, as in: http://code.google.com/appengine/articles/gae_backup_and_res...

But you should do something like that anyhow is my real point.

Cloud computing is still a single point of failure because it's a "single company of failure". It may address your SPOF problems with buying and running hardware but I'm seeing people let that throw caution to the wind. Instead of "hard disks will fail, know that will need contingency plans" the situation if you choose to go the cloud computing route would perhaps be "trust the redundancy system they try and supply but be sure to have a contingency plan because there is always some possibility you will be hosed."

If you're building with something higher level than Google AppEngine, or Engine Yard, etc., the principle still applies. Even up to the consumer level, look at those people "building their bookmarks" at the higher level, they lost a lot of their hard work because they trusted a single company/system.

As a business, I don't ever want to put someone in that position where they realize they should not have trusted one system. That deteriorates any trust they had in our brand and is just shitty all around. And the only way to begin on that path is to not do that yourself.

patio11 · on Feb 20, 2009

Only in the sense that "Someone robbed an unlocked house" is a good case for sniper teams. I mean, sure, it will work -- but as an interim measure, locking the door is a bit easier, almost as effective, and far less messy.

nikblack · on Feb 20, 2009

Here is a tip - don't hire consultants to talk to you about blog marketing and microformats etc. until you have the basics like backup and support in order.

Magnolia hired such consultants was engaged with them for a while. I know because I am an advisor to a company who hired the same consultants - and I made the same argument to them, ie. there are more basic things that need to be completed before good money is spent on people who love to talk about Blogs, FOAF, OAuth, OpenID, Syndication and online marketing etc.

With the money Magnolia would have saved from these consultants, they could have really had their site in order. I also don't know how you can have so many employees, advisors and 2-3 consultants on tap and nobody mentions backup.

So other startup founders, forget about all that fancy 2.0 stuff - get the essentials in order first.

zandorg · on Feb 19, 2009

I have a wonderful data backup story.

I was backing up one nearly-full 1TB Seagate Freeagent to another one, and then after 180GB of data transferred, I knocked one of them over by mistake, it fell 15 cm to the floor and died!

I sent the broken Freeagent to Seagate data recovery, but unsure if it would be recoverable, I had this amazing brainwave.

Before I backed up one Freeagent to another, I quick-formatted (note: quick, not full). When one of the drives broke, I had a brainwave: Why not undelete the remaining data?

So I found a utility on the web to unformat and undelete, and the only data left from the copying was off a spare 3.5" drive, and I got back all but 2 weeks of data.

I was lucky. Now I have 3 Freeagents to serve my backup needs.

However, all my core data is now on some 2.5" drives, which are more reliable than a 3.5% Freeagent.

statictype · on Feb 20, 2009

The lesson as always: Keep your Freeagent drive on the floor.

I've knocked mine over more than a few times, but since I've got it sitting on the floor, there's been no issues.

Anon84 · on Feb 19, 2009

datacenterknowledge.com seems to be having some issues right now (the HN effect? ;) Here is the gist of the post:

     The social bookmarking service Ma.gnolia reports that 
     all of its user data was irretrievably lost in the 
     Jan. 30 database crash that knocked the service 
     offline. That means that users who were unable to 
     recover their bookmarks through publicly available 
     tools (including other social media sites and the 
     Google cache) have lost all their data.



     Ma.gnolia founder Larry Halff said last week that the 
     service’s MySQL database included nearly half a 
     terabyte of data. Yesterday Halff informed users that 
     a specialist had been unable to recover any data from 
     the corrupted hard drive. ”Unfortunately, database 
     file recovery has been unsuccessful and I won’t be 
     able to recover members’ bookmarks from the Ma.gnolia 
     database,” he wrote.

1SockChuck · on Feb 19, 2009

Data Center Knowledge had another story (Exploding Servers) on Slashdot at the same time it was getting the HN traffic, so a double-whammy.

patio11 · on Feb 20, 2009

Meaning no disrespect to HN, but given that we seem to send about 4k visitors a day in my recent experience, getting linked by Slashdot and HN on the same day is a single-whammy.

arthurk · on Feb 19, 2009

The website is down (at least for me). But there's also some information on the official ma.gnolia site: http://ma.gnolia.com/

woadwarrior01 · on Feb 19, 2009

I seriously didn't expect this from magnolia.

How hard/expensive is it to get an automated daily backup of the DB and sync it up to something like Amazon S3 ? I run a similar pet project ( pardon the shameless plug :) http://tagz.in ), albeit with only a dozen people actively using it. I maintain a 30 day rolling backup of daily db dumps, paying next to nothing for the backups. I've had a number of people asking me how can they trust me if I decide to stop the whole thing, and a part of the plan was to allow people to download their bookmarks in Netscape bookmarks format for atleast 30 days from the day I announce just in case I decide to take it down / run out of money. Not like this is ever gonna happen (To be honest i probably can run it for the next 6 months, even if I lose my job this very day.), But I feel contingencies like these need to be thought about, well in advance when we're dealing with user's data (bookmarks in this case).

PS: I use Postgresql and have a script on my laptop which syncs my local db with the latest db snapshot every week, which isn't really a good idea, but works for me, given the tiny scale that I'm working on.

bonaldi · on Feb 19, 2009

Hardish. Expensive when you're talking about terabytes.

Regardless of that, though, ma.gnolia had backups; it's just they were untested and were dutifully backing up the corruptions that were being introduced on the software level.

It's that bit I'd like to know more about. Were they faithfully following the Rails Way and expecting the model to handle validations, instead of setting up the database to do double validation? How did these errors creep in, and how did they grow to be so catastrophic?

gensym · on Feb 19, 2009

This corruption doesn't have anything to do with the "Rails Way". The corruption was at a much lower level then things like nil foreign keys, long fields, etc. that Rails validates. It was the filesystem that was corrupted, and whether he did database validation in ActiveRecord, using database validations, or both would not have made a difference, as far as I can tell.

Think about it for a second - if the corruption that caused this was things that could be validated using Ruby or in SQL declarations, it would to more than feasible to import that data into some format form which it could be recovered.

tokenadult · on Feb 19, 2009

Didn't Reddit lose user account data at least once after start-up (as I seem to recall from personal experience)?

I know that MSN lost a lot of user account data soon after its start-up, way back in the 1990s. Preserving data for an online community is not easy, even for richly funded market entrants.

r7000 · on Feb 20, 2009

They had a laptop stolen that had a backup of their database. After it was stolen they let users know about the theft and that passwords were stored as plain text. I don't believe any data was ever lost.

trickjarrett · on Feb 19, 2009

But in every other major case I've heard of, it has been properly backed up.

tokenadult · on Feb 19, 2009

I was a charter subscriber to MSN, and I never got my email address back. So I stopped using MSN.

trickjarrett · on Feb 20, 2009

Oh! I hadn't known.

okeumeni · on Feb 19, 2009

I just feel so sorry for the Magnolia guy; what a waste of hard work!

He must be feeling like shit right now; I know I would.

antidaily · on Feb 19, 2009

Was Magnolia profitable? Did it run ads?

Elepsis · on Feb 20, 2009

No, and only occasionally. For the approximately 2 years that I used ma.gnolia it ran ads for less than half of the time.

darkhorse · on Feb 19, 2009

this makes me wonder about the backup systems of sites like tinyURL.

imagine all those shortened URLs everywhere becoming useless!

palish · on Feb 20, 2009

A terrible tragedy of epic proportions, to be sure.

zach · on Feb 20, 2009

As if millions of tweets were suddenly silenced.

geuis · on Feb 19, 2009

Was this a fried harddrive problem? If so, maybe Spinrite would help?