Hacker News new | past | comments | ask | show | jobs | submit login
Storing 25 petabytes of Megaupload data costs us $9,000 a day (arstechnica.com)
313 points by ttt_ on March 22, 2012 | hide | past | favorite | 133 comments



I've dealt with e-discovery sets. No one really has answers to what to do when you have a litigation hold on data. Legislation commonly requires "retention of anything related to X case", but how do you know what's relevant and what isn't? When you are a third party the ambiguity increases. So you end up with an everything and kitchen sink data dump. Even with everything the data is commonly useless without context. You have files without access logs and logs referrencing local namespaces etc...

With a 25 petabyte discovery, I'm not surprised that everyone's scratching their heads on what to do next. This isn't just an MPAA/Megaupload problem. Even a smaller dataset like a 10-20TB discovery has numerous problems. Hosting/indexing/classifying/reviewing millions of documents is an open issue for the legal field. What do you do when there are multiple parties who all need to see "everything"? If everyone does their own thing how do you reference materials in a consistent manner across the interested parties? If you all agree to host the data in a neutral place who pays for it? What if the technology of that host benefits one party at the expense of another?

For years the legal field has had a "print it all out and have a team of paralegals go over it" viewpoint. Clients don't pay for computers, but they do pay for paralegal hours. Only recently has that become untenable. Discovery sizes are growing exponentially per year. It's common to have a new discovery set come in larger that every previous set combined, and the legal industry doesn't really know what to do about it.


If it's anything like the usual proceedings, the lawyers involved would probably prefer if they had several copies of the data, too.

I'm sure most of these drives were arrayed in such a fashion where they're unreadable unless in the proper equipment. It's not like you can just buy a pile of off-the-shelf external drives and start copying, either, as the contents might be unreadable unless the proper software is installed and configured correctly.


Not to mention that it would cost (1Tb * $109 @ newegg) * ( 25,000 Tb) $2,725,000 just to replicate it.


I'm sure you could get a bulk discount though.


Wouldn't making more copies of the files be exactly why they are taking the site down in the first place?


This isn't a problem, its the 'perfect moment' for a frighteningly ambitious, disruptive startup, per HN speak.

I think its clear that this would be a problem in an industry where the most used tool is a bookshelf of various common interpretations of law. It just doesn't scale.


No, it's a problem.

The idea that law firms don't do IT is born of ignorance. Even the idea that the most used tool is a bookshelf of common interpretations of law is wrong, unsurprisingly all that stuff is online these days. Why on earth would anyone with two brain cells to rub together search for case histories on paper when there's Lexus Nexus which will do it in fractions of a second and is updated on a regular basis?

Yes these people are pretty conservative they're not stupid and they have money (a lot of money). Large law firms will typically have IT departments of hundreds and there's a significant market for companies specialising in supporting the legal sector.

But this is a tough problem because not only are there huge amounts of data involved but because people are actively trying to hinder your search. The law says that you have to to disclose everything relevant but it's pretty much common practice to also disclose a bunch of things that might be and a shed load of stuff you know full well isn't. This is a double win for the defending firm - not only can no-one accuse you of not providing everything they might want to see, but you get to do so in a way that makes very very difficult for them to find the stuff that matters.

Think of it this way - the problem is basically the same one Google have (already a pretty tough problem) only instead of the people providing the pages being keen that their information be discovered, they're actively trying to hide it.


The federal government does have a process for this sort of thing, if they seize an alleged drug dealer's house and that house has a mortgage the United States Marshals Service will pay the mortgage. If they seize cars, furniture, other assets the government is responsible for the storage of those items until the case has been resolved. [1]

I would guess that MegaUpload's lawyers will make the claim that the data on those servers is critical to their defense and must be maintained. That is probably an accurate claim, DotCom will want to present evidence of compliance with DMCA notices, counter the claim that a "majority" of the content was under copyright, etc. Best case for DotCom would probably be that his lawyers argue for retaining the data and the judge lets Carpathia destroy it anyway. That would give DotCom reasonable grounds for appeal.

[1] http://en.wikipedia.org/wiki/Asset_forfeiture


You hit it on the head. I used to have to deal with alot of litigation-related preservation of data... we started calling discovery "Mutually Assured Destruction".

The lawyers basically try to make things more and more onerous in order to encourage a settlement. It's amusing, as long as you aren't accountable for the data!


  1. Purchase uber-expensive mansion and set up an astronomic mortgage payment/month.
  2. Get said domicile seized by US gov't.
  3. Stall legal proceedings until Marshals have paid off the entire house.
  4. ???
  5. Profit!


It sucks, but that's the price of doing business. They chose their customer, and now are (unfortunately) tied to consequences. Same thing happens to building owners who have a crime committed by a tenant, the leased space becomes a crime scene until the police/govt are done with their investigation.


But instead of just losing use of those servers, they have to actively maintain them. It looks like they just want someone to pay for the $9000 per day that it costs to keep the servers running, they aren't looking for money that they lost from not being able to use the servers.

To go with your analogy, sure building owners aren't allowed to rent the space back out, but they most certainly not asked to pay for usual water, gas, or electricity bills (because they aren't/shouldn't be being used).


The owner still has grounds, building maintenance, carrying costs from a mortgage, etc. Obviously the loss would differ from a strip mall with 10 tenants vs a giant commercial building with 2.

Third parties get f'd in criminal cases all the time. One of the asset seizure stories the WSJ recently covered involved a guy whose cash happened to be in transit with an armored car company. That was, until the companies assets were seized by the feds along with his cash.

Lucky for Kim Dotcom, New Zealand let him get a hold of some of his cash for living expenses. I'm not sure he'd be so lucky in the US. Carpathia probably should have had an insurance policy for this.


I don't understand why this is the case, why can't they exchange the HDDs in the servers, tell the FBI to come with a big van and store the HDDs with the evidence on it themselves in some evidence room?


25 petabytes of hard disc is a considerable cost.

Some people have mentioned insurance. Would any policy cover you for a customer being investigated for possibly criminal activity?


Assuming 2TB internal HDD's at $120.00 per piece, you would need $1.5MM worth of drives to transfer this to. Not including the man-hours associated with doing so.

The $3.2MM per year is actually sounding a lot more reasonable to keep it up and running.


Wow, I bother to calculate it. However,

- if you buy HDDs in these quantities you will get a big discount (50%+)

- they most likely get the HDDs back after the case is finished, but not the electricity bill.

- you have to factor in opportunity costs from not being able to rent out that rackspace.

- I don't even understand how this can be proper evidence if it just hangs around at a private company.


Agreed, I was doing budgetary napkin math here... there are a lot of factors, though it does seem that the 9000/day while MAYBE 20% higher than reality is fairly realistic.


* HDDs have some of the lowest margins in the components business. If you're buying from Newegg you're basically getting it at cost (unless there is some conspiracy by all warehouse retailers to keep the price at >$100).

* It's the chain of custody that is important not who keeps it. Most LEOs outsource to various contractors for all types of investigations.


Don't forget replacement RAID controllers! The RAID config is usually dependent on the exact drive/controller combination, so removing the drives and not saving the controller may result in them becoming inaccessible. At the very least, they would almost certainly not pass requirements to be used as evidence if that happened.


I really hope they weren't backing 25TB of data with RAID controllers, this is well out of RAID league and well into the territory of distributed file system techniques. I ditched RAID years ago for ZFS for only an 18TB file server at home. Start talking about thousands of servers and RAID is pretty useless.

That said, IANAL but I think the interesting thing here is that just making copies of the hard drives is likely not enough from a evidence standpoint. With such a data set there are likely layers of data abstraction software that would be need to be replicated to make any sense of it.


I would pull out the existing drives, put them in boxes for the FBI, and power down the servers. As customers sign up for machines and storage, buy replacement drives and power up the servers with the new drives.

It sounds to me like they are looking for a way to keep their income flowing with servers that they will have a hard time renting out in the near future. In other words, they just got hit with a huge excess of server capacity that they are going to have a hard time renting out short term (and probably longer term as well).


You can get insurance for "my client shafted me but I've still got bills to pay" but more typically I think you ask the client to post bond. It's called knowing your client. An ISP doing this much business with one client but not considering the possibility of MU going under is irresponsible.


Still less than the cost of keeping all those servers up and running. I would think that from a legal point of view, like other people have pointed out, MegaUpload needs to cover the costs. They are essentially insolvent, so that is what you'd need to insure against. That it is related to a legal investigation, should not matter for the insurance coverage, imo.


Why are the servers still running? Surely it's easier and cheaper to just turn off the servers, power down the racks and lock the cages?


The running cost is just part of the equation... those servers are depreciating away and generating no revenue!


I'm wondering this, too. The data would still be there, and they wouldn't have to worry about maintenance costs at least, right?


Shutting them down could easily destroy evidence depending on how it's done. EX: Many redundant systems try replicating data from machines that are taken off line. What happens when the full network is taken off line quickly has probably not been tested.


I see, so even just shutting down the servers hosting MU's data could be damaging to the whole network?


The building owners have to continue to pay property tax, condo fees and maintenance costs like security, gardeners and window cleaners.


I hate to say it, because I'm as sympathetic to this ISP as anyone else. But this is what insurance is for. Big single customers are risks. Surely someone out there would have been willing to write a policy to cover unforseeable things like this.


If the insurance writer did their due diligence and realized the big single customer was MegaUpload (and what a large portion of MU's users were doing with the service), they would've been insane to write an insurance policy. I really don't think MU going down in an investigation falls under the category of "unforseeable." It was always a question of when, not if IMHO.


But investigations take hours, while this case will take years.


So, 25 petabytes ... 25 million gigabytes. Anyone care to guess how much of this data is illegitimate? And how much of that is under MPAA's copyright?

Back-of-the-envelope calculation: Just did a search for "1080" on some unnamed site and it appears a bluray rip of a movie encodes to roughly 10GB. So that would be 2.5 million movies in 1080p quality. I don't think we've made that many, have we? Especially if you consider that movies that came out before the "high-definition era" are encoded to about a 10th of that size (700MB-2GB roughly, afaik).

Maybe I'm missing something obvious.

Not counting TV series for instance (are they also intellectual property represented by the MPAA? I'm not in the USA so I never really dug into that).

Movies duplicated in different quality formats are usually a 10th or less of the size of a 1080p Bluray rip as well, as an upper limit I could add a factor of x1.5 for that.

But then, the "long tail" of movie rips are 700-800MB and do not have duplicates.

Unless ... is the MPAA also representing porn? Because then all bets are off and I can easily accept that this 25 petabyte consists mostly of MPAA protected intellectual properties.

But otherwise, what percentage of these 25 petabytes would you estimate actually represents illegitimate data owned/represented by the MPAA? 2% ? 10% ?

Is that fair to the owners of the other 90% of the data? Even if it's probably mostly porn? (I'm fairly sure most of the data has to be porn)

I'm just wondering. Also because it's interesting to speculate what could be in these 25 petabytes. If you have a better guess I'd love to hear it :)


Think of the problem from the opposite direction. What the hell _else_ could the bulk of that possibly be?

I've got 4TB of storage on my media server, last time I checked at ~60% used.At _best_, possibly 0.5% of that is stuff that I've personally created and have copyright over. Hell, all the email I've sent _or recieved_ that wasnt spam filtered since mid 1995 only comes to a few hundred meg - including attachments! Smething less than half a TB of it is music which I have some kind of right to have as digital files (some of it purchased as files, some of it ripped from cd and vinyl - which is somewhat less clear legally with respect to my rights to have a "copy" as a file on my hard drives). Realistically, outside of academia and industry (who presumably aren't significant users of MegaUpload) chances are so close to 100% that any 1GB+ file is copyright encumbered in a way that gives the MPAA an interest that it doesn't matter. The nearest I cold come to justifying the rest is that some of it it "time shifted" TV (from the PS3 TV tuner/PVR), some of it is DVD backups for discs I own, some migh euphemistically be referred to as "timeshifted DVD rentals or loans", but a _lot_ is copyrited content found on "channel BitTorrent" or downloaded from YouTube. If the copyright police confiscated _my_ hard disks and catalogued them in front of a judge, I'd have a very hard time looking him in the eye and saying "I didn't think I was doing anything wrong!"

I fear that line of argument bodes badly for dotcom…


"Hell, all the email I've sent _or recieved_ that wasnt spam filtered since mid 1995 only comes to a few hundred meg - including attachments!"

!!!

I drown in approximately 2GB/year and that's after deleting some stuff and many attachments. Do you make a lot of phone calls or hand-write letters? Or not run a business maybe?


Just because you don't happen to have activities that generate enormous amounts of data doesn't mean nobody else does.

There's also people that make a lot more video data than you apparently do, same for sound recordings. Do you believe that the majority of recorded video data in the world is MPAA's? That would go against everything we know about the "long tail". You do realize that of all text (or books) written in the world, less than 1% actually gets published? Why would video or audio be any different? Just because you don't produce it, doesn't mean the MPAA industries are the only ones that do.

Also as you say, there's academia, PHD students I know use equipment that generates gigabytes per second. Or without equipment there's computer programs that do it from calculations. Now I agree it doesn't seem likely they'd use megaupload for that.

And that's just a few things I can come up with right now. Who knows what sort of computer stuff other people do that generates retarded amounts of data they need to share?

I'm also not saying that all that copyrighted stuff isn't there, it's just that if it's 25 petabytes worth of data, it just doesn't add up, the MPAA-represented copyrighted part of those 25 petabytes can only be a tiny fraction of that amount.


" … it just doesn't add up, the MPAA-represented copyrighted part of those 25 petabytes can only be a tiny fraction of that amount."

Interesting. My completely uninformed assumption is diametrically opposite to that. I'm guessing there's not much sophisticated de-dup going on, and _way_ more of the diskspace on MegaUpload is probably various bit-wise different rips of the same smallish set of Hollywood blockbusters. And while I agree the long tail suggests there's almost certainly lots of people out there with lots on non-mpaa-copyright encumbered files - I'd be quite surprised to find the area under the "long tail" was withing 2 or 3 orders of magnitude of the "fat head" occupied by all the copies of all the dvd rips and broadcast tv recordings.

I wonder if there's any believable data anywhere to see whether I'm wrong?

(Note: I've got a non-US-centric view of this too, here in Australia internet connection plans lag behind the US in terms of speed and bandwidth caps, so even though I've got friends who generate lots of GoPro footage for example, but they'll in general be storing them on locally attached harddrives, not trying to push gigabytes of raw data out into "the cloud". That might explain why I make tghe possibly-incorrect assumptions that I do…)


Thoughts on ameliorating that risk…

If I start heading about individuals being charged for content they have on their own machines (as opposed to content they're sharing is p2p), I'll invest in a few OpenWRT capable wifi/adsl routers, plug in some large disks, distribute them amongst nearby friends/neighbors (or out-of-wifi-range friends with adequate bandwidth), and run Tahoe on them all in a configuration that means its _provable_ that my sections of the encrypted files do not contain enough to decrypt into any identifiable copyrighted work.

That's still not foolproof, but adding the requirement that "they" identify the existence of a network of storage devices all storing encrypted segments of files, and then having them need to confiscate enough of them as well as my disks should put me that one step ahead of the "lower hanging fruit".


Frankly, the idea of pinning MPAA's actions on mostly protecting porn is a brilliant angle to fight back with in political arenas, where MPAA tries to draw human emotions and claims pirates are taking jobs away from honest people and our children are in trouble if pirates keep sharing their works.

Save your children; MPAA cares mostly to keep servicing them porn; end them now!


Thanks. Yeah, the porn bit didn't even occur to me as I started writing the post. I just wondered "25 petabytes is an awful lot of data, although numbers on data can get pretty big easily, let's see if I can come up with a reasonable estimate ...", and then, "... what else could be there? TV series--waitaminute--PORN!"

So yes, PORN.

(Unrelated, your username is awfully similar to mine..)


don't jump in the mud with them, keep yourself clean and argue user rights and freedom.


As an idea of just how much a petabyte is, from their motion:

> Each petabyte is equal to approximately 1,000,000 (one million) gigabytes, or the storage capacity required to store “…about 13.3 years of HD-TV video. About 50 Libraries of Congress.” (http://www.nasa.gov/centers/langley/news/researchernews/rn_d...)

I was guessing games and software make up some data. I just had a look at a torrent site. They have 400 pages, with 25 items per page. At roughly 5 GB per item that's still only about 50 terabytes.


Megaupload has multiple copies of almost every single movie and TV show in the world in multiple formats and quality levels.

I have watched some very rare international films on Mega and it never failed me on a search.

If you multiply out all the movies, all the multiple copies, all the TV show espisodes, all the different quality levels, you would get a few petabytes.


> Megaupload has multiple copies of almost every single movie and TV show in the world in multiple formats and quality levels.

> I have watched some very rare international films on Mega and it never failed me on a search.

> If you multiply out all the movies, all the multiple copies, all the TV show espisodes, all the different quality levels, you would get a few petabytes.

I implore you to do a back-of-the-envelope estimate and get to 25 petabytes.

Please take into account that I purposefully started with the highest quality level currently in use, which is ~10GB per movie. The assumption is that lower quality levels will only be a fraction of that size therefore all versions of a single movie won't be more than 20-50GB total. Additionally, many older movies are not available in the high quality formats and take up only a tenth of this space.

Also I'd like to point out that the MPAA is the Motion Picture Association of America, which is an American trade association that represents the six big Hollywood studios. So they don't represent the "rare international films".

Still, I'm just making rough estimates myself as well. I'd love it if you could come up with a possible way to explain that, say, at least half of this data could consist of MPAA represented intellectual property.

I just don't think they produced nearly enough content to warrant shutting down all the data stored in MegaUpload, even given that MegaUpload probably does store all their IP illegally, in multiple formats.


I did, hence: "you would get a few petabytes."


I would wager that a good percentage of the data is completely redundant.


In civil or criminal asset forfeiture, the state can conceivably confiscate property if used for or if it enables a crime. In some jurisdictions it doesn't even matter if the owner of the property and the criminal have really nothing to do with each other. (i.e. Your stolen SUV was used to rob a liquor store.)

Also, the government could have probably seized everything anyway as evidence. The problem with that is setting up that much rack space and network infrastructure isn't cheap.

That's Carpathia's basis for compensation. They are providing a service to the government. Seems like a no-brainer.


Help me out with the math here:

1 terabyte costs them $128.41 per year, right?

Amazon S3 would cost them roughly $444 per year, if they were using the Reduced Redundancy Storage.

The cheapest HD that I see on pcpartpicker (in terms of Price/GB) is the Western Digital Caviar Green 2.5 TB (5400 RPM) for $135.43, which is $0.054/GB. That's $54.17 per TB.

If you want a single backup, that's $108.34 per TB. Two backups (3 copies of each file), is $162.51 per TB.

So, if I'm doing this right, as long as their HDs last at least 15 months, on average, they have triple-redundancy, and the cheapest price ratio for consumer hardware. And I'm not even counting their power, network, cooling, or puny humans to maintain it all. That means their HDs, if they were made out of the cheapest parts I could find, would have to last significantly longer than 15 months, on average.

They're actually doing really good on price, if you ask me.

Or am I missing something obvious, or doing the math horribly wrong?


The hardware cost 1.25 million. It cost $9000/day in electricity/connectivity/rackspace.


True. I'm honestly surprised they didn't declare daily depreciation, for $1.25 million in assets that are obsoleted by new technology at Moore's pace, I'd expect that could arguably be quite high.


they need to get a lawyer and work out the real cost. add in lost business opportunity, staff costs, brand damange etc. and you start hitting numbers where the government would have to do something and you may actually start getting compensated.

it is bullshit that they have to foot such a large cost for a government investigation.


Got it. So, yeah, I missed something completely obvious.


Yeah, I think you are exactly right. This, at 9K/day for that much data appears really good. That is ~$.028 cents PER TB PER DAY to store it.

Christ, I am paying Dropbox charges me, at $10/MONTH for 50GB the equivalent of $6.67 PER DAY in storage costs.

For my 50GB, I am charged $.33 per day.


I don't know why I was downvoted, but the $6.67 per day cost is PER TERABYTE

I think whomever downvoted that thought I was referring to the per GB cost.


The math looks good to me. I think they are saying that it's costing them $128.41 per year just to continue to run it -- i.e. considering the hardware itself a sunk cost.


Its the storage disks the government need, not the rest of the server hardware, if they cant come up with an agreement then shutdown all the servers, take out the disks, catalog, and put in a warehouse somewhere. They are now free to re-use the rest of the server for something else.

That would satisfy the needs of the government if they need access to the data, preserve it if in the future people are allowed to download it, and prevent the MPAA from complaining that it was given back to Megaupload.

I'm sure the cost of storage would not be minimal, but they could still use the rest of the hardware and not have to keep the servers powered up.

Possible problems:

- Maybe the servers cant be shutdown and brought back up without certain passwords or encryption keys

- Labor cost of shutting down and catalogging all those disks ( if done progressively would probably work )

- Others?


You want to pull 1,103 server's disks?

The compatibility problems you have trying to get data off a disks in a hardware raid make it impractical (do you have the EXACT SAME version hardware revision & firmware; without this you can't guarantee you can read it back)? Its either that or you have to pay data recovery guys to rebuild it.

Not to mention hard drive costs are still high, post thai floods. For enterprise gear we are getting most quotes ~ $300 AUD a disk, for consumer gear its ~ $130. Most servers are running at least 2 disks... that $286K worth of disks alone if your talking cheap - low capacity disks, not including labour to change the disks and test the hardware before you deploy a workload to it.


>The compatibility problems you have trying to get data off a disks in a hardware raid make it impractical (do you have the EXACT SAME version hardware revision & firmware; without this you can't guarantee you can read it back)? Its either that or you have to pay data recovery guys to rebuild it.

Forgive me for for sounding like a member of Anonymous but..

So? That's the government's problem. I don't see why a private company should be in any position where they're required (at wallet or gun point) to help in an investigation at their own expense. Pull the drives, warehouse them, and let the FBI do what they have to do. They have IT to rebuild the RAIDs.


Carpathia wants to be paid $9000 a day like they were before the Megaupload case started and pulling the drives and telling the FBI that it's their problem doesn't do that.


No, but it stops the costs from accruing. Legal action could then be brought to recover their other costs.


Well as I said the could do it piecemeal (10 servers at a time or what have you) and it would require that the data (harware/versions/etc) for the source machine be archived with the disks. (good cataloguing)

Also $286K / $9k / day ~= 32 days, so in 32 days it costs them more to keep it all running that simply replace the disks.

Not to mention they could now lease those servers to another client and make money off of them.


Yeah, I think you would also consider that with a fairly standard equipment lifecycle program you would have 1/3rd to 1/4 of the equipment up for renewal in the next 12 months. Maybe you bring that renewal forward... and maybe you spec some nicer hardware (new Xeon E-5, etc) and are able to get a better consolidation happening so you don't replace everything.

I just wonder if they would be in a better financial position declaring it a loss and claiming the tax break / insurance then actually doing the work?

As long as they can support the hit to the cashflow, they are likely to be able to claim the damages back from either one of the parties of the case once its decided and / or insurance.


Some people have legitimate files on MegaUpload. They can no longer access them. Should they be able to get those files back somehow?


People with legitimate file can't access them now anyway, this would not change that. I'm just proposing a way out for Carpathia while the case is decided in court.


Correct, but perhaps after this ends, MegaUpload could return the legitimate files. I don't expect the government to do the same. And I very, very much do not expect the MPAA to do the same.


This is why the US Government shouldn't have seized the site first, and asked questions later. They should've filed a trial against them, and let them keep hosting the data, and if found guilty, then take it down.


I agree, standard operating procedure when dealing with digital information should always include a generous window for destruction of evidence.


I disagree. Servers should be inaccessible, but not at all accessed by the law enforcement agency coming up with claims in court - why would they need to look at that amount of data anyway and why should tax payers suffer.

Just - put them behind bars. I said so.


The urgency is because Carpathia's lease has run out, they can't stay at the $9k/day facility.

Carpathia has to pay $65k to move the servers, then $37k per month to keep them in a climate controlled facility while powered down. Lost profits are still a relevant consideration. This is a doozy of a damages calculation. What's depreciation on assets that are rendered obselete by (something like) Moore's law?

I'd say Carpathia deletes the data and then supports the petitioners (those with lost data) in the takings clause case against the government. Carpathia claims indemnity against claims by pointing at MegaUpload and the Feds, but probably gets joined in a bunch of messy lawsuits. Real roll of the dice.


Why are options that would destroy any chance at Megaupload conducting business in the future even on the table before a trial is finished? I suppose a large amount of damage is already done, but it would be a gross injustice to kill their business before anything started. The government should pay to keep this up until they've conducted their trial. If they don't, and they lose somehow, I hope they get hit with a massive countersuit.


For reference, this amount of data would require 190 backblaze storage pods ($7,384 for 135TB) totaling $1.4 million.


Is that pre- or post- Thailand flooding prices? (Have drive prices settled down again?)

And is there any redundancy included in your figures? (25 petabytes / 135 terabytes == about 190 pods)


Backblaze quoted this price before the flood at $120 for 3TB. I'm not sure if post-flood prices are back to these prices or what range of prices you can expect at this volume.

This figure doesn't include redundancy. Backblaze uses raid6, so the usable capacity is actually 117TB per pod[1]. With this configuration the final cost should be closer to $1.6 million.

  [1] http://hardware.slashdot.org/comments.pl?sid=2341206&cid=36834390


> $120 for 3TB

So even if you throw out the rest of the hardware and have zero redundancy, $40/TB * 25000 terabytes = $1,000,000.

(Of course, somebody has to ship and store over 8,000 HDDs, too.)


prices have settled down as in they are fairly stable now, but a terabyte still costs roughly double what it did before the flooding.


Can someone more familiar with this stuff explain why Carpathia's still paying for "power and connectivity"?

I would have assumed that the FBI would have actually seized the servers, or at the very least pulled the network cables out.


Not if the evidence is on the servers.


I'm really confused by this. Is Megaupload (or any megaupload employee) facing a criminal trial? How can any "evidence trial" (or whatever they call it) be maintained if a law-enforcement agency doesn't have the drives?

Have any hashes been taken of the drives?


The costs of moving that amount of data is crazy. I am surprised the government hasn't seized the hardware, and chucked it in a warehouse.

I did some back of an envelope calculations... and its absolutely crazy. Tape would require over 17,000 Ultrium tapes. Now you could De-dupe... but the hardware to process and dedupe that much data.... not really an option. Not to mention the time to write that many tapes...

Something like thumpers (48 disk sun x86 boxes) would be expensive, last time I looked they were around say $30k for a large order... 160tb usable assuming 4tb disks are the thumper is split into 4 Raid 6 arrays... thats 160 thumpers... 4.8 Million

Even backblaze pods would likely be well over a Million...

This doesn't even cover hosting costs, transfer and such. Not to mention to be usable in court there are going to have to be processes in place to document compliance and validity of the copy....

All in all not a great place for Carpathia to be in.


> Now you could De-dupe... but the hardware to process and dedupe that much data.... not really an option.

FYI, the data is already comprehensively de-duped.


I know they dedupe on the file level, but I wonder if they are doing block level deduping... as without a big shared storage infrastructure block level deduping becomes pretty hard to serve at high speed from as the reads potentially become distributed across hundreds of nodes...

To build out web scale systems you generally use commodity gear and accept the overhead of duplication, heavy deduping requires massive IO, and there is no way i can see you can be dealing with that much data have that level of IOPS and be profitable charging what they charge.


This post really hurts my soul.

Can Carpathia sue the Federal Government for NOT seizing assets. It's the data, not hardware. Data is transferable. They want it, take it.

Can Carpathia sue? This kind of injustice just makes me boil.


Pardon my ignorance, and this is a serious question, but why can't they just turn them off? I realize it doesn't address all the costs, but surely it could reduce them significantly.


While I'm obviously uninvolved, there are several reasons why turning them off might be a hassle. Primarily is that if anything is encrypted, the encryption key is currently in memory and the disks are already open - turning the servers off might require reentering details.

Alternatively, in a setup of this size, I imagine there'd be no end of redundancy configurations - RAID for individual disk sets, DRBD (/a SAN equivalent) across servers - turning them off would turn all that HA tech off. Meaning that, when you try to bring the system back up, the redundancy implementation might say "oh no, I've lost x peers from my set of n" and fail itself completely.

Shrug. I've no doubt explained it badly, but there are good reasons to keep them running. It's not just a case of "pop the hard drive out and use it elsewhere"; the logic associated with keeping 25 petabytes of data would also have to be restored to its current state.


This gives an idea of the economic activity generated by services like megaupload and what is being removed from the economy by killing the company. Roughly $3.2 million in hosting fees, and could be more if that's just the cost price. Also salaries, over $1 million in hardware, and the various other suppliers. One wonders about the GDP of the recording and movie industries relative to the businesses they're going after.


What's the big deal? Just delete the data. Customer pays for storage. Company stores. Customer stops paying for storage. Company deletes.

Sure, a bunch of pissed off people will certainly be upset - but it's not the company's fault - they shouldn't have to bear this burden. I can't see how they could be sued by users for this, they didn't enter into any kind of agreement with the users, only with the customer.


The customer (MegaUpload) would like to continue to pay for the storage, they have been prevent by a third party (the government).


Thing is - the company stopped paying because a couple of cowboys are currently checking if the town is big enough for them both.

Users don't care about that. Yes, some users are totally legitimate. Dropping their data because someone, somewhere thinks that the original service in general was 'evil' is not a solution that is as straight-cut as you'd like to present it.


The article also mentions that they need to hold onto the data because it may be used as evidence in the court case.


It boggles my mind that the court hasn't taken position of the hardware/data then.

If the hardware is left running, and not in official custody, how do any authorities know that the data isn't being tampered with?


I can definitely understand the lost potential revenue of having unused servers. But I wonder why they are saying that cost includes power and connectivity for the servers? Seems like they would be powered down. I would actually have assumed the servers to be confiscated and taken off-premise by the FBI.


It's possible (though I'd doubt it is the case) that those servers have data from other clients as well.


Any large datacenter has storage allocated in complex ways. IT may be challenging to isolate one customer's data physically from another. For instance, physical disks can be virtually concatenated then repartitioned into virtual storage containers, which may reside on part of 1, or parts of many physical disks.

They could of course migrate all of the contested data onto new storage. But its large; who would pay for that?


That makes the most sense to me.


It is possible that all the incriminating evidence, i.e. the harddrives, are encrypted, but now are mounted and are hence decrypted. Pull the power and you might need someone (who might be outside the USA court's reach) to enter the password.

Pull the power and maybe all the evidence goes away...


There are legal obligations for the government to reimburse telco companies if they are asked to spy on their customers on the governments behalf. Also, obviously, if you want to use data as evidence in a trial, it needs to be stored safely by the police and sealed off, to ensure that its integrity is preserved.

So either the government needs to pay up, store the drives themselves or dismiss these thousands of harddrives from the witness bench.

Also, I cant see how the EFFs claim has any legal merit. Theres no obligation for a site to enable you to access data you sent them.


Megaupload had a lot of assets that were frozen. I don’t know about the legailities, but it would be reasonable to use frozen funds to pay for this.


I really doubt that would be legal. It would be akin to the government can forcing you to pay for them to prosecute you.

Of course, I think the whole category of forfeiture law it blatantly contrary to the Fourth Ammendment, so what do I know.


> It would be akin to the government can forcing you to pay for them to prosecute you.

Taxes?


What if they are not found guilty? Does the Government pay it back from Social Security?


I know any legitimate hosting company would never do this, but it would be amazing if they just "happened" to have very loose security on the servers that hold Megaupload's data, and if some hacker were to..."gain unauthorized access" and wipe all the data.

They wouldn't be held responsible for a breakin, would they?


If a court orders them to take "reasonable, and industry best practice computer security approaches to prevent the data being lost", then, yes, they would be responsible for "accidentally-on-purpose" leaving the servers accessible. If it were to happen, someone would do a post-mortem, find out that they intended for it to be broken into, and then they would be guilty of contempt of court/destroying evidence/etc.


Yeah, realistically it just wouldn't be worth it. But it's fun to imagine.


Are there seriously people that used Megaupload as their sole and only place to store their data? What if there was a fire in the server room? Or some MU intern typed rm -rf /?


Believe it or not, there are even people who don't have backups at all. I know, right?


Not backing up your important data is passive stupidity.

Backing up your data on Megaupload and then not keeping a local copy is active stupidity of such vast scale that I refuse to believe it has ever happened in the real world.


>Backing up your data on Megaupload and then not keeping a local copy is active stupidity of such vast scale that I refuse to believe it has ever happened in the real world.

To borrow a famous quote.. nobody ever went broke overestimating the intelligence of the public.

That said, having your backup service raided by LEO's toadying about for entertainment conglomerates was probably not even on the list of people's possible downtime concerns..


> Backing up your data on Megaupload and then not keeping a local copy is active stupidity of such vast scale that I refuse to believe it has ever happened in the real world.

The purpose of having a backup is so that you can restore your data after the original is lost, because there is a good chance the original will be lost. Explicitly retaining it on your local system is not enough to keep it safe. If it was, there would be no reason to backup in the first place.

If a million people used the service for backups, it wouldn't be unreasonable to expect several of them to have a drive failure each and every day.


Do you really think a million people used Megaupload for backups, when the service was nearly completely unusable for that task?

Do you think they were zipping up the contents of their own systems and then uploading them daily?


Making Megaupload pay this $9000/day seems unfair, too. The US government has cut off all of Megaupload's revenue streams, and so they would be forcing Megaupload to keep paying for a service that they can no longer make money from.

Regardless, why is the cost so high if the server is down? Does this $9000/day reflect the loss that Carpathia suffers from not re-allocating this storage to other customers? It would seem to me that given Megaupload's current state, it would be sufficient to leave the servers powered down and unplugged until the legal dispute is resolved... surely the cost of leaving a server idle is not $9000. I don't really know though...


Wait, why do they need to keep the power and connectivity on if they are not being actively accessed? That would same a bunch of money it they were kept off.


"and argues that if that data needs to be preserved, someone else—the government, Megaupload, or an interested party such as the MPAA or EFF—should bear the costs of preserving the data"

Fucking exactly!! Have fucking MPAA pick up the tab.

EDIT: its going to be amazing (and will take years for sure) to see if this won't bite MPAA in the ass if the judge will rule that yes they do have to pay. Would looove to see that. This should be actually a rule of thumb -- if MPAA believes someone is infringing, court suit is entirely fine, but you guys (MPAA) will pay to keep the light on in the meanwhile.


I wonder if this is an opportunity for a startup, and/or an insurance product sold to SaaS end users, hosting facilities, or developers.


I don't feel one bit of sympathy towards Carpathia, they most likely had all the warning signs on their door - dmca notices, legal notices and more but they willingly provided service to a company with garbage morals.


In my mind, the MPAA is certainly the best choice to pay these costs. They're the ones with the problem, they should be the ones to bear the burden. Especially considering Megaupload offered to take the data and they explicitly forbade it. If the MPAA didn't like the solutions offered, but can't come up with something better, then I think Carpathia should get to do what it wants.

Edit: The opinion above has NO legal basis whatsoever. As many have pointed out, it's not even legally possible. I made this comment solely from a "In a perfect world..." standpoint.


The data is being held (from what I understand) as part of an FBI investigation. The MPAA may be responsible for the case existing and bringing the supposed crime to light but the FBI are the people trying the case and this is beyond the MPAA now. The responsibility lies with the government agencies.


The FBI, as MPAA's minions, should be the ones paying it. Either their "confiscate" the data or they do not. If they do, they have the responsibility to keep it up, if they do not, they should have to give it back to its "owner".


Unwitting puppets perhaps, but I doubt the people of the FBI want very much to be dealing with copyright drama. That's like fixing bugs in form code to them. It must feel like a complete waste of their time and talent.


Then they should choose not to do it.


There's no legal basis for the MPAA to bear the cost of storing the data. The MPAA is not even a party to the proceedings, which, by the way, are criminal and not at all civil.


So this confuses me a little, I would be glad if someone could explain it to me: why does the MPAA get a voice in the proceedings, if they are not a party to it?


They are the supposed victims of the alleged crime.


I could be wrong, but I believe the government's position is that they have already copied everything that they want, and that Carpathia was free to destroy the data. If they are continuing to store it of their own volition, then they are fully responsible to pay for it on their own. Even if their reason for preserving the data is for fear of a lawsuit by Megaupload or others, that is entirely their responsibility if there is no Government requested court order in place that requires them to preserve it as evidence.

If they have been ordered to hold the data/servers as evidence, then the Government, as the requesting party, is solely responsible for the costs. If and when any party is found guilty at trial, the government can ask that the guilty defendant(s) be held financially responsible for all costs of prosecution, which would include reimbursement of these expenses.


they MPAA can pay for it from a small part of the huge new profits they are seeing after stoping this international pirating operation.


>the huge new profits they are seeing after stoping this international pirating operation.

I seriously hope that was sarcasm.


As a matter of opinion, I agree. If someone claims they own something and they want to fight for it they should take responsibility for it. Property ownership comes with conditions and responsibilities. From an ethical standpoint, why should intellectual property be different? Do they own the data or not?

Expanding on this, if a company wants to claim that they always own a particular combination of bits should they pay rent for storing it on my system? The problem is people are generally short sighted and focused on the immediate future. Other considerations are rarely thought about until other people insist that something will affect their immediate future. There are costs associated with data that are rarely considered at the moment. No one to talk about this, it may change the way they do things and cost them money.


Technically, I think Carpathia can recover their costs from megaupload, even with the contract being terminated. If megaupload can't pay, then the insurer(s) should step in.


At that price, it would only take them 150 days to stop losing money if they started building some Backblaze servers (http://blog.backblaze.com/2011/07/20/petabytes-on-a-budget-v...). 25,000TB / 135TB * $7,384 = $1,367,407 minimum cost of commercial hardware to store that much.

"historically and mind-bogglingly large amount of data" - you could say that again.


Did you read the article at all? They can't lease the equipment to anyone else, which is what's costing them money. The servers were estimated to be values at $1.25M, but they are also unable to lease all the space that they are using, which is significant.


Yes, I get that. That's why I said "stop losing money." It seems like their only solution would be to transfer the images of the data to cold storage and then find some source to recoup the cost of the cold storage devices (which is why I thought the backblaze numbers would be interesting.)


I doubt they are legally allowed to do that. Otherwise it would be a simple matter of a lot of tape drives.


Surely it wouldn't take too long to contact both the legitimate users of Megaupload and ask them to download their totally legitimate files?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: