Dropbox’s Exodus from the Amazon Cloud

jamwt · on March 14, 2016

I want to point out one other thing about this project, if only to assuage my guilt. And this community of hackers and entrepreneurs seems as good a place as any to clear the air.

There is a necessary abridgement that happens in media like this, wherein a few individuals in lead roles act as a vignette for the entire effort. In particular, on the software team, James and I are highlighted here, and it would be easy to assume we did this whole thing ourselves. Especially given very generous phrasing at times like "Turner rebuilt Magic Pocket in an entirely different programming language".

The full picture is James and I were very fortunate to work with a team of more than a dozen amazing engineers on this project, that lead designs and implementations of key parts of it, and that stayed just as late at the office as we did during crunch time. In particular, the contributions of the SREs didn't make it into the article. And managing more than half a million disks without an incredible SRE team is basically impossible.

james_cowling · on March 14, 2016

Couldn't agree with this more.

One of the things that's not immediately apparent in an initiative like this is the huge amount of effort required to provision, deploy and manage a system of this scale, across a whole bunch of different teams. We couldn't have done any of this without a tremendous group of SREs with the wisdom to invest heavily in automation, in addition to all the hard work.

We'll follow up in a week or two with a blog post highlighting some of the work here. I think the framework for disk remediation (automatically detecting failures, re-provisioning, etc) is particularly interesting.

timdoug · on March 14, 2016

As the SRE writing the blog post on disk lifecycle, I can confirm it amongst others are in flight. :)

tomatoman · on March 15, 2016

Is it true a girl broke your arm while arm wrestling?

james_cowling · on March 15, 2016

Haha, who am I to argue with a story like that?

skuhn · on March 14, 2016

Nice of you to call this out. I think everyone understands that not all of the people involved will get mentioned by name in a Wired article. A project like this clearly isn't a four person operation, and hopefully most people realize there are always a bunch of people working behind the scenes.

My only disappointment is that by trying to set a PR friendly narrative, Dropbox greatly compressed the time spent on the project from inception to completion, and that erases a lot of people's involvement. As someone who worked on MP when it actually was a four person project, it's a little frustrating to finally be able to talk about it after all this time while at the same time the public perception is that the project began after I left the company.

jacquesm · on March 14, 2016

Very classy to be this transparent about what constitutes a team effort. The common narratives that focus on one or a very small number of visible people has a self-reinforcing effect where those key people get more name recognition and in turn that leads to them being even more visible the next time the team accomplishes something. It's rare for team leaders to step forward and to acknowledge the fact that it really was a team effort.

curyous · on March 14, 2016

What does SRE mean?

anp · on March 14, 2016

I assume Site Reliability Engineer from the last time I looked at their openings. I didn't look at the description, but I assume it's a more formal description for devops?

jclaybaugh · on March 14, 2016

devops is not a job title. It is a way of doing things.

debaserab2 · on March 14, 2016

Except it is actually a job title at all sorts of places.

sciurus · on March 15, 2016

And if you google it, you'll see plenty of discussion around why that's controversial.

https://www.google.com/search?q=devops+job+title&ie=utf-8&oe...

pc86 · on March 15, 2016

Whether it's controversial or not has no bearing on the fact that it is at this point a time most certainly a job title.

samstave · on March 14, 2016

"devops engineer" is most certainly a job title. Devops is analogous with "agile" when talking about process...

devonkim · on March 15, 2016

One thing I'll grant Agile is that there's a great deal more standardized technical topics and standards meaning expectations when using the term. With devops, there's a massive difference between how big companies (old, archaic companies by tech company standards) perceive of it compared to how smaller companies do.

My current role was sold to me as a "devops" role with the usual jargon of automation - I can't think of how this role has actually prepared me to pass a serious interview as an SRE at Google, Dropbox, Yahoo, or other tech companies. What really has happened after looking at all the recruiter spam and some of the interviews I've taken on a lark is that the term has become another way of marketing to engineers with modernized skillsets when they can't retain good ones for more than maybe two months (during which time the engineer is interviewing for another role almost certainly) because good engineers tend to run away from these companies for many legitimately founded reasons.

jamwt · on March 14, 2016

http://googleforstudents.blogspot.com/2012/06/site-reliabili...

draw_down · on March 14, 2016

Very cool! Next challenge: shipping without the crunch time and late hours for the team :)

JustSomeNobody · on March 14, 2016

I think most of us figured there'd be more people involved, but it shows lots of class to come post a comment giving credit.

riobard · on March 14, 2016

> Measuring only one-and-half-feet by three-and-half-feet by six inches, each Diskotech box holds as much as a petabyte of data

This number is very interesting. Basically Diskotech stores 1PB in 18" × 6" × 42" = 4,536 cubic inch volume, which is 10% bigger than standard 7U (17" × 12.2" × 19.8" = 4,107 cubic inch).

124 days ago Dropbox Storage Engineer jamwt posted here (https://news.ycombinator.com/item?id=10541052) stating that Dropbox is "packing up to an order of magnitude more storage into 4U" compared to Backblaze Storage Pod 5.0, which is 180TB in 4U (assuming it's the deeper 4U at 19" × 7" × 26.4" = 3,511 cubic inch). Many doubted what jamwt claimed is physically possible, but doing the math reveals that Dropbox is basically packing 793TB in 4U if we naively scale linearly (of coz it's not that simple in practice). Not exactly an order of magnitude more but still.

Put it another way, Diskotech is about 30% bigger in volume than Storage Pod 5.0 but with 470% more storage capacity.

That was indeed some amazing engineering.

atYevP · on March 14, 2016

Yev from Backblaze here -> Yup! It's pretty entertaining! Granted we only store 180TB b/c we use primarily 4TB drives and it's inexpensive for us to do so. If we had 10TB drives in there we'd be pushing up to 450TB per pod, but the price for us would increase dramatically. Pod 6.0 will be a bit more dense!

james_cowling · on March 14, 2016

To be fair to Backblaze this level of storage density is really only possible with recent advances in disk technology (higher densities, SMR storage, etc).

Also not everyone wants to be packing a petabyte into a box. At that level of density you need to invest a lot of effort in replication strategies, tooling, network fabric etc to handle failures with high levels of availability/durability.

jamwt · on March 14, 2016

Yes, a 1PB+ failure domain only made sense because Magic Pocket is very good at automatic cluster management and repair.

samstave · on March 14, 2016

Are you using spindle or ssd or flash? (admittedly I dont know if you consider flash and ssd to be the same)

What is your price per GB raw?

jamwt · on March 14, 2016

> Are you using spindle or ssd or flash? (admittedly I dont know if you consider flash and ssd to be the same)

We have flash caches (NVMe) in the machines, but the long-term storage devices are spindle--PMR and SMR.

> What is your price per GB raw?

I cannot disclose, exactly, but it is well below any other public price I've seen.

voltagex_ · on March 15, 2016

So NVMe is "ready for production"? Do you think you'd use it at home, or only in the enterprise?

jamwt · on March 15, 2016

I mean, if at home you need 400k IOPS, 1GB/s of writes, and 2.2GB/s of reads... go for it! I sure don't, but more power to you. :-)

scurvy · on March 15, 2016

NVMe's lower latency is great. I've never really known how "slow" regular SATA/SAS SSD's were until I used NVMe ones.

voltagex_ · on March 15, 2016

I'm tempted to get one and set it up as some kind of cache in my NAS. I'm already in silly territory with it now, anyway. 1GB/s of writes sounds crazy though - I haven't got anything that can write to it that fast!

stock_toaster · on March 15, 2016

I have a PCIe NVMe card (with a Samsung 950 pro M.2 in it) in my home server, which serves as an L2arc device for a ZFS pool. It is pretty nice. Runs a bit warm though.

voltagex_ · on March 15, 2016

We are now well offtopic but I've seen some advice recently that advises using an SLOG rather than L2ARC for a home NAS.

chronid · on March 15, 2016

I think that a home nas would have a lot of async writes and very few/none sync writes. So a ZIL dedicated device (the SLOG) is not really useful/helpful. I'm not sure even if you really need an L2ARC device, just slap 8/16Gb of RAM on it and you will be happy...

Obviously if you are playing seriously with VMs and databases and whatever else both of those (SLOG/L2ARC) may become important for you, I'm going for the "i'm just using this to store my raw-format photos, backups for the taxes and other big files" kind of usage here. :)

voltagex_ · on March 16, 2016

You don't have any contact details but I'd definitely like to talk to you more about home NAS.

zbjornson · on March 15, 2016

Aside, Google Cloud has NVMe available on almost all machine types if you want to play with it [1].

[1] https://cloud.google.com/compute/docs/disks/local-ssd

toyg · on March 15, 2016

I put it in production, and it's something else. Whether I'd use it at home depends entirely on how rich I feel I am ;)

gbrayut · on March 15, 2016

We have been primarily using Intel NVMe storage for our database servers since fall of 2014 with no major complaints. Our high end desktop/laptop systems are also using the latest Samsung NVMe M.2 drives which are screaming fast.

rwg · on March 15, 2016

Just to put a number to "screaming fast", 2.5 GB/s sequential reads averaged across the entire drive on a Samsung 950 Pro NVMe SSD: https://pbs.twimg.com/media/CaMhc2oVAAE4q3q.jpg:large

jorangreef · on March 15, 2016

Surely it's the other way round?

That is, the better your software can handle repair, the smaller your failure domain can be?

A larger failure domain would only make sense if you wanted to minimize the compute required per unit stored?

jamwt · on March 15, 2016

We do want to minimize that--compute costs money, and storage is irreducible. But you can only reduce aggressively if your larger distributed system is great at repairs, since the failure of a single box kicks off 1PB of repair activity!

jorangreef · on March 16, 2016

Thanks, do you have a time lapse backup of data in Magic Pocket? For example, in case of software error?

ChuckMcM · on March 14, 2016

Not sure what is more amazing, the project of this scale (love the disk drawers!) or that infrastructure for managing the fleet of drives gets top billing!

ju-st · on March 14, 2016

In datacenters space is cheap in comparison to power so often it's simply not economical to cram more in less space.

jamwt · on March 14, 2016

Sometimes; but depreciation costs are usually more than either of those (for storage).

yardie · on March 14, 2016

Did you build your own centers? Running our hosting we had far more constraints on our power than anything else. Even though we only had 20U of equipment we ended up taking a 44u rack. Our disk arrays being the largest consumers. This made the case for moving to SSDs even stronger.

dexterdog · on March 14, 2016

That was the problem I always had in my pre-cloud days. We were usually running racks half-full because of power restrictions in the DC. It was convenient because you could keep you cold spares and tools in your expensive space, but it always seemed like you should be putting more useful stuff in the racks.

jamwt · on March 14, 2016

You can substantially change the economics if you design your own power distribution systems.

scurvy · on March 15, 2016

Aye, but even off the shelf 3ph 50A 208v circuits gets you a lot of power. That's enough for 7 or 8 4U storage nodes with 70 drives each.

Then again, I don't really like putting too much weight that high up in a rack that already has 2600lbs of gear in it.

cm2187 · on March 15, 2016

Is dropbox using SMR? (I never used dropbox. Do they offer incremental backup? Otherwise I would have thought they would need to be able to do random writes, unless SMR is so cheap that it is economical to keep some dead data when a user uploads a new version of a file)

cm3 · on March 15, 2016

There was a post a while back here from a Dropbox employee explaining how they manage SMR disks manually via a direct HBM card and basically doing what the firmware would do otherwise. I can't find the post but it was talking about working around the architectural deficiencies of SMR while still getting to use more disk space.

jamwt · on March 15, 2016

Magic Pocket is a block storage system, and all storage in it is append only. We'll go into more detail on the on-disk format in a blog post. But, yes, we use the SMR disks on an HBA, and we directly address the disk on a LBA (and zone) basis.

jamwt · on March 14, 2016

The difference to get even higher is our system supports host-managed SMR disks that come in 10T and 14T sizes.

jl6 · on March 14, 2016

I've never heard of a 14TB disk drive. Is that what you mean? Who makes these?

Sanddancer · on March 14, 2016

10 TB drives can be gotten off the shelf -- http://www.hgst.com/products/hard-drives/ultrastar-he10 -- and 14 TB drives are probably in the state of availability for large customers; it's not unusual for drive makers to make available drives coming down the pipe to certain customers.

jl6 · on March 15, 2016

True, it's just that they also tend to shout about upcoming drives and boast that they are previewing to select customers. Case in point, HGST marketing is in full swing on that He10 but good luck finding any stock in a mainstream retailer.

jamwt · on March 15, 2016

Yep, vendors do sometimes provide stock to large customers ahead of general availability.

benologist · on March 14, 2016

http://www.theregister.co.uk/2016/03/03/samsung_2_5_inch_15t...

2.5" 15TB ... and SSD, and expected to get a lot bigger too!

jl6 · on March 14, 2016

But that's not an SMR drive.

olavgg · on March 15, 2016

What interface and protocol do your drives have/use? Is it SATA, SAS or a custom interface/protocol?

jamwt · on March 15, 2016

SATA SMR disks, zone management is ZAC.

riobard · on March 14, 2016

Is that in production system now? W-O-W! :O

Sanddancer · on March 14, 2016

Not to knock Dropbox's engineering, because it is sexy, but there are off the shelf enclosures right now that will fit 90 drives in a 4u case. Given that said enclosures have full rack rails and all, and are narrower, I don't doubt that a more densely designed server is feasible in the least with dropbox's design.

james_cowling · on March 15, 2016

These are standard-width rackmount chassis btw. They're 1.5x depth but otherwise it's standard 4U.

The dimensions given are just an approximation.

gshx · on March 14, 2016

There are other concerns to keep in mind, eg. the upgradability of storage to use next gen cards that use 3d nand, etc. I'm sure they thought through a list of concerns before going this route.

jamwt · on March 14, 2016

Hi HN! A couple of us from the Magic Pocket software team are around to answer questions if anyone has some.

anp · on March 14, 2016

The article makes a brief mention of Go causing issues with RAM usage. Was this due to large heap usage, or was it a problem of GC pressure/throughput/latency? If the former, what were some of the core problems that could not be further optimized in Go? If the latter, I've heard recent versions have had some significant improvements -- has your team looked at that and thought that you would have been OK if you just waited, or was there a fundamental gap that GC improvements couldn't close?

Could you comment more generally on what advantages Rust offered and where your team would like to see improvement? Are there portions of the Dropbox codebase where you'd love to use Rust but can't until <feature> RFCs/lands/hits stable? Are there portions where the decision to use Rust caused complications or problems?

jamwt · on March 14, 2016

Good questions, let me try to tackle them one by one.

> The article makes a brief mention of Go causing issues with RAM usage. Was this due to large heap usage, or was it a problem of GC pressure/throughput/latency? If the former, what were some of the core problems that could not be further optimized in Go?

The reasons for using rust were many, but memory was one of them.

Primarily, for this particular project, the heap size is the issue. One of the games in this project is optimizing how little memory and compute you can use to manage 1GB (or 1PB) of data. We utilize lots of tricks like perfect hash tables, extensive bit-packing, etc. Lots of odd, custom, inline and cache-friendly data structures. We also keeps lots of things on the stack when we can to take pressure off the VM system. We do some lockfree object pooling stuff for big byte vectors, which are common allocations in a block storage system.

It's much easier to do these particular kinds of optimizations using C++ or Rust.

In addition to basic memory reasons, saving a bit of CPU was a useful secondary goal, and that goal has been achieved. The project also has a fair amount of FFI work with various C libraries, and a kernel component. Rust makes it very easy and zero-cost to work closely with those libraries/environments.

For this project, pause times were not an issue. This isn't a particularly latency-sensitive service. We do have some other services where latency does matter, though, and we're considering Rust for those in the future.

> Could you comment more generally on what advantages Rust offered and where your team would like to see improvement?

The advantages of Rust are many. Really powerful abstractions, no null, no segfaults, no leaks, yet C-like performance and control over memory and you can use that whole C/C++ bag of optimization tricks.

On the improvements side, we're in close contact with the Rust core team--they visit the office regularly and keep tabs on what we're doing. So no, we don't have a ton of things we need. They've been really great about helping us out when those things have sprung up.

Our big ask right now is the same as everyone else's--improve compile times!

> Are there portions where the decision to use Rust caused complications or problems?

Well, Dropbox is mostly a golang shop on the backend, so Rust is a pretty different animal than everyone was used to. We also have a huge number of good libraries in golang that our small team had to create minimal equivalents for in Rust. So, the biggest challenge in using Rust at Dropbox has been that we were the first project! So we had a lot to do just to get started...

The other complication is that there is a ton of good stuff that we want to use that's still being debated by the Rust team, and therefore marked unstable. As each release goes on, they stabilize these APIs, but it's sometimes a pain working around useful APIs that are marked unstable just because the dust hasn't settled yet within the core team. Having said that, we totally understand that they're being thoughtful about all this, because backwards compatibility implies a very serious long-term commitment to these decisions.

steveklabnik · on March 14, 2016

  > On the improvements side, we're in close contact with the Rust core team

One small note here: this is something that we (Rust core team) are interested in doing generally, not just for Dropbox. If you use Rust in production, we want to hear from you! We're very interested in supporting production users.

anp · on March 14, 2016

Thanks very much for the detailed and thoughtful answers!

I've read before (somewhere, I think) that Dropbox effectively maintains a large internal "standard library" rather than relying on external open source efforts. How much does Magic Pocket rely on Rust's standard library and the crates.io ecosystem? Could you elaborate on how you ended up going in whichever direction you chose with regards to third-party open source code?

jamwt · on March 14, 2016

We use 3rd parties for the "obvious" stuff. Like, we're not going to reinvent json serialization. But we typically don't use any 3rd party frameworks on the backend. So things like service management/discovery, rpc, error handling, monitoring, metadata storage, etc etc, are a big in-house stack.

So, we use quite a few crates for the things it makes no sense to specialize in Dropbox-specific ways.

anp · on March 14, 2016

Cool. This might be getting into the weeds a bit, but are you still on rustc-serialize for json or are you trying to keep up with serde/serde_json? If you're using serde, are you on nightly? From your comment above I got the impression that only using stable features was very important, so I'm curious how your codebase implements/derives the serde traits.

jamwt · on March 14, 2016

We're on rustc-serialize. JSON is not really a part of our data pipeline, just our metrics pipeline. So the performance of the library is not especially critical.

jedisct1 · on March 14, 2016

Are you guys hiring Rust developers by chance? Asking for a friend :)

james_cowling · on March 14, 2016

Very possibly :) Drop me an email at james@dropbox.com.

jabl · on March 14, 2016

How do you do network io with rust? Thread-per-connection, non-blocking (using mio or?), or something else?

jamwt · on March 14, 2016

We have an in-house futures-based framework (inspired by Finagle) built on mio (non-blocking libevent like thing for rust). All I/O is async, but application work is often done on thread pools. Those threads are freed as soon as possible, though, so that I/O streams can be handled purely by "the reactor", and we keep the pools as small as possible.

windlep · on March 14, 2016

Any plans to open-source the futures-based Rust framework? :)

anp · on March 14, 2016

From a parallel conversation[0] on the Rust subreddit:

>Are you going to open source anything?

>Probably. We have an in-house futures-based I/O framework. We're going to collaborate with the rust team and Carl Lerche to see if there's something there we can clean up and contribute.

[0]: https://www.reddit.com/r/rust/comments/4adabk/the_epic_story...

panamafrank · on March 15, 2016

Did you guys use a custom allocator for rust? And if so how did it differ from jemalloc and how could it be compared to C++ allocators like tbb::scalable_allocator?

jamwt · on March 15, 2016

We use a custom version of jemalloc, with profiling enabled so that we can use it.

We also tweak jemalloc pretty heavily for our workload.

ignoramous · on March 14, 2016

Please do a technical blog post on some of these:

1. Go vs Rust at Dropbox's scale and requirements.

2. Maintaining availability whilst moving from AWS to Diskotech and Magic Pocket.

3. Internals of Magic Pocket (file-system, storage engine, availability guarantees, scaling up and scaling out, compression details, load balancing, cloning etc)

4. Improvements in perf, stability, security, and cost.

Thanks.

james_cowling · on March 14, 2016

Yep we're going to get at least a few actual technical blog posts online in the coming month. We haven't got around to writing them yet tho so feel free to surface any requests :)

Will most likely start with the following: 1. Overall architecture and internals. 2. Verification and projection mechanisms, how we keep data safe. 3. How we manage a large amount of hardware (hundreds of thousands of disks) with a small team, how we automatically deal with alerts etc. 4. A deep-dive into the Diskotech architecture and how we changed our storage layer to support SMR disks.

Hopefully these will be of a sufficient level of detail. We certainly won't be getting into any details on cost but we're pretty open from a technical perspective.

(Lemme just take a moment to say how great Backblaze's tech blog posts are btw.)

atYevP · on March 14, 2016

> Yep we're going to get at least a few actual technical blog posts online in the coming month. We haven't got around to writing them yet tho so feel free to surface any requests :)

Here's a request, how did you migrate all that data. Are you going to be open-sourcing any of the tools that you built up? We'd be interesting in hearing about that process. A lot of folks using B2 right now are trying to do this exact thing (or making copies if not actual migrations). Cheers! ;-)

wpietri · on March 14, 2016

I would be especially interested to hear how a focus on actual customer experience was maintained during all of this deep technology work.

One of the ways tech companies wrong is by putting the tech first and the users second. I think Dropbox's early competitive advantage was putting the user experience first. Everybody else had weird, complex systems; you folks just had a magic folder that was always correct. User focus like that is easier to pull off when it's just a few people. But once you break the customer problem up into enough pieces that hundreds of engineers can get involved, sometimes the tail starts wagging the dog and technically-oriented decisions start harming the user experience.

How did you folks avoid that?

atYevP · on March 14, 2016

Yev here -> Let me just say you're welcome and thanks for joining the party :D

gvb · on March 14, 2016

Do you know what Amazon is going to do with their excess 500PB of capacity? Are they scaling up fast enough that it isn't a big deal? Is Glacier selling your vacated storage?

You must have left a large hole in AWS' revenue stream. I assume their fees for transferring your data out of their system helped cover that short-term, but 500PB of data is a lot of storage capacity to sell.

Your other answers indicate your separation with AWS was very amicable, much more amicable than I would have expected. Either "its just business" or they have plans to backfill the hole you left.

discodave · on March 14, 2016

One of the AWS people at Re:Invent mentioned that Amazon is storing 8EB across all storage services (S3, Glacier etc.)

So 500PB is ~6% of that... Although I'm sure Amazon would have liked to keep Dropbox as a customer it's probably a pretty small percentage of their revenue. Also, if you assume that AWS is growing at ~50% year over year then 6% of S3 is only about one or two months worth of growth.

I think it was this talk but I'm not sure: https://www.youtube.com/watch?v=3HDQsW_r1DM

toyg · on March 15, 2016

With storage you're constantly mothballing and replacing old iron behind the scenes, so Amazon might just ditch a bunch of old depreciated drives without hard feelings. AWS being on the up and up for SMEs, they are probably not crying over losing one large but very demanding customer -- and DB are still on AWS for European customers anyway.

djcapelis · on March 14, 2016

Who's part of your security team over there? Most of the big cloud companies have people I trust working for them and I've been pretty comfortable telling most of my clients that cloud services are often going to have a better security team than they will. :)

But now that you've moved away from AWS, have you expanded your security team to help make up for the fact that you need folks to cover all your infrastructure security needs now too? Have you made the hires you needed to deal with the low level security issues in hw and kernel? How have you all been dealing with this so far?

I ask, because I help people make sense of their security issues and my clients explicitly ask me about your company. So keeping track of what you all are doing there is pretty relevant!

jamwt · on March 14, 2016

We have several dozen very, very good engineers working on a few discrete teams--infrastructure security, product security, etc. I don't feel comfortable getting any more specific than that; we keep security information pretty close to the vest.

djcapelis · on March 14, 2016

To be honest, that response is less than comforting. But I understand how that'd be your answer given your position in the organization. Hopefully someone else at Dropbox feels like sharing more at some point and being more forthright about your company's practices in protecting data others entrust to them.

Feel free to refer others to this thread if you think it'd be helpful.

toyg · on March 15, 2016

That's disingenious. The awesome sec teams at AWS or Google are not going to review and maintain your internet-facing VM network config, which is where the real threats are. Sure, their firewalls might be a bit more robust (you don't really know, you can't see them), but anything behind them is just as insecure as you configure it yourself, and that's where penetration happens in real life. Threats at hw or kernel level in a hypervisor are extremely hard to exploit, by the time you get close to that stack you can usually see much easier and juicier targets to harvest.

djcapelis · on March 15, 2016

Sure. So your point is they've always needed a good security team? I'd agree with that. Either way, they should still be open about their security practices and have good answers to the questions I asked.

josh2600 · on March 15, 2016

Still only one shared private key for all of dropbox...

tedmielczarek · on March 14, 2016

I'm curious what your inter-language interop situation looks like. Do you have any Go code calling directly into Rust code (or vice versa)? Are they segregated into standalone programs? If so, are they doing communication via some form of IPC?

Cool stuff, really interesting read!

jamwt · on March 14, 2016

Not yet, but we will very soon. We'll be using Rust as a library with a C ABI, and calling it via CGO.

abhv · on March 14, 2016

Does magic pocket use any non-standard error-correction algorithms, or just parity-style RAID5 or 6?

james_cowling · on March 14, 2016

We use a variant on Reed-Solomon coding that's optimized for lower reconstruction cost. This is similar to Local Reconstruction Codes but a design/implementation of our own.

The data placement isn't RAID. We encode aggregated extents of data in "volumes" that are placed on a random set of storage nodes (with sufficient physical diversity and various other constraints). Each storage node might hold a few thousand volumes, but the placement for each volume is independent of the others on the disk. If one disk fails we can thus reconstruct those volumes from hundreds of other disks simultaneously, unlike in RAID where you'd be limited in IOPS and network bandwidth to a fixed set of disks in the RAID array.

That probably sounded confusing but it's a good topic for a blog post once we get around to it.

pnathan · on March 14, 2016

That is extremely cool. Please tell the author(s) of that system that a stranger on the internet has appreciation for that feat!

willglynn · on March 15, 2016

You might be interested in a paper Microsoft published a few years ago, entitled "Erasure Coding in Windows Azure Storage", which describes some similar concepts in greater detail:

http://msr-waypoint.com/en-us/um/people/yekhanin/Papers/Usen...

H4Y3s · on March 15, 2016

Hey that is cool. I worked on the first commercial disk array at Thinking Machines Corporation in 1988. We used Reed-Solomon encoding because we didn't trust that the drives would catch all the errors (they did). Good work solving some of those bandwidth constraints in rebuilding the data. John (Smitty) Hayes

jamwt · on March 14, 2016

We use Reed Solomon variants with local reconstruction codes.

tikue_ · on March 15, 2016

What's it like working with Rust full-time? Is it similar to using any other language for a job? Do you find yourself enjoying coding more? Do you get unnecessarily caught up in micro-optimizations for things you wouldn't blink at in Go/Java?

bjz_ · on March 14, 2016

Do you use much in the way of formal specifications? I've heard that Amazon currently uses TLA+ to verify the semantics of their systems.

dividuum · on March 14, 2016

Interesting. Here's the paper amazon published: http://research.microsoft.com/en-us/um/people/lamport/tla/fo...

It's only covering a few use cases and doesn't go into too much detail and shows example code but it seems like using TLA+ has been beneficial for them.

luhn · on March 14, 2016

> If a bunch of people shared some files via Dropbox, the company stored the files on Amazon’s Simple Storage Service, or S3, while housing all the metadata related to those files—who they belonged to, who was allowed to download them, and more—on its own machines inside its own data center space.

I'm surprised at this. I would have expected the opposite: using AWS for the application and metadata but managing your own storage infrastructure. Much like how Netflix runs on AWS but streams video through its self-managed CDN. Can you shed some light on why it was designed this way?

jamwt · on March 14, 2016

Building a massive, super-reliable storage system was not really feasible given Dropbox's resources back when it first started up.

pm90 · on March 14, 2016

Not related to Magic Pocket, but when the article talks about Dropbox in house cloud, are they saying that you have your own instance of some minimal cloud-computing services? If you're permitted to disclose, I would love to know whether its based off OpenStack.

crudbug · on March 15, 2016

Which OS and FS the rust code is running on ?

james_cowling · on March 15, 2016

No filesystem - the rust codebase directly handles the disk layout and scheduling.

Our previous version of the storage layer ran on top of XFS which was probably a bad choice since we ran into a few XFS bugs along the way, but nothing serious.

jaysoncena · on March 15, 2016

this is really cool! Can you give more details on how blocks, replications, metadata, etc. are being handled?

james_cowling · on March 15, 2016

Lazy answer but we'll blog about this in the next month.

On a high level it's variable-sized blocks packed into 1GB extents, which are then aggregated into volumes and erasure coded across a set of disks on different machines/racks/rows/etc. We also replicate cross-country in addition to this. Live writes are written into non-erasure-coded volumes and encoded in the background.

The volume metadata on the disks contains enough information to be self-describing (as a safety precaution), but we also have a two-level index that maps blocks to volumes and volumes to disks.

More info to come later.

kdkeyser · on March 15, 2016

Are live writes replicated in real time, or are they locally staged (which is what Facebook does, I believe)

Also, do you mimic the eventually consistent behaviour of AWS, or do you offer a stronger form of consistency?

jamwt · on March 15, 2016

Live writes are written out with 4x redundancy in the local zone, then asynchronously replicated out to the remote zone. Some time later, it is erasure coded into a more space-efficient format, independently in each zone.

barkingcat · on March 14, 2016

The word "cloud" loses all meaning in this article.

"The irony is that in fleeing the cloud, Dropbox is showing why the cloud is so powerful. It too is building infrastructure so that others don’t have to. It too is, well, a cloud company."

Wait ... so using AWS is "cloud", having your own servers is "cloud" too. Everything is cloudy!

newman314 · on March 14, 2016

Remember, there is no cloud, there is just someone else's computer... =)

jimbokun · on March 14, 2016

Was discussing "in the cloud" with my wife yesterday, and I said "It just means on some company's servers." Which suddenly de-mystified the whole concept for her.

deanCommie · on March 15, 2016

I was at PDC in 2008 when Microsoft announced "Windows Azure".

I've never been more confused in my life. "Wait, what does this version of Windows do?"

Luckily they rebranded it as just "Microsoft Azure" which makes a shit ton more sense.

scott_s · on March 14, 2016

Dropbox is clearly a cloud company to its users. Whether or not they use a generic cloud, or maintain their own systems, is an implementation detail to users of the service.

fivesigma · on March 14, 2016

Let us hope that the IoT buzzword doesn't get abused as much as "cloud" has in the past 4-5 years.

azdle · on March 14, 2016

As someone who works for a company whose main business is IoT/M2M/Connected Devices/Embedded Stuff on the Internet, it already has been and it almost meaningless.

jaysoncena · on March 15, 2016

There's already a Cloud over IP! My brain melt down trying to understand what that means.

I hope there will be no "IoT over IP"

wnissen · on March 15, 2016

It is true that there are two concepts conflated in "the cloud". The first, and by far the most important from a technology standpoint, is the ability to grow and shrink pools of storage, servers, databases, etc. simply by plugging and unplugging hardware. Robust failover, dynamic data replication, all these things are necessary before you can begin having cloud services, and in fact, Google and others had this well before there was such a thing as a cloud.

The second aspect is paying on an incremental basis for someone else to provide the pools developed in the first part. But that's more or less a trivial extension once you've got the first.

pm90 · on March 14, 2016

What they probably mean is that Dropbox has its own cloud. I'm assuming that they do have some kind of system similar to amazon ec2 to start and kill vms at will, and other similar basic cloud services.

asendra · on March 14, 2016

So, just for fun. The video quotes "over 1PB" of storage per box.

I count 6 columns of 15 rows/drives. Might be 7 columns even, there's some panels that aren't fully open on the video.

So, 90-105 drives. I'm guessing they are using 10TB drives, although maybe they can get bigger unannounced drives? Roughly, the math seems to check out.

Quite impressive. Guess the Backblaze guys need a Storage Pod 6.0 soon :P (I know, I know, different requirements/constraints)

atYevP · on March 14, 2016

Yev from Backblaze here ->

Yea :D Though if we used 10TB drives we could get about 450TB in one pod, it would just be hilariously expensive for us to do so. Granted that's not 1PB per pod, but we'd imagine our costs are lower. But yes, different use-cases, and we're working on 6.0 ;-)

simonlc · on March 14, 2016

Someone said above that they support 10 and 14TB drives.

echelon · on March 15, 2016

Unrelated to the content of the article--I've never seen the oft talked about "anti-adblocker interstitial" before. I was surprised to find that the website blocks viewing of the article entirely because I'm running Adblock.

It's an interesting subject. I'll never click on or be persuaded by ads, so they're not really gaining anything by showing me ads. I'm essentially worthless traffic to them no matter how they cut it.

I do understand the problem they're trying to solve, and I would like to pay for content that I find worthwhile. A convenient microtransaction protocol where I _don't have to sign in or interact in any way_ would be nice. (I don't want to waste time interacting with their bespoke account system / sign-on, even if it is "frictionless".)

roel_v · on March 15, 2016

"I'll never click on"

Doesn't matter, impressions are valuable to marketers as well.

"or be persuaded by ads"

Yes you will, you just think you don't/won't.

echelon · on March 15, 2016

> Yes you will, you just think you don't/won't.

That's a really blanket statement. I'm rather minimalist, and I typically form negative opinions of the things I see in ads.

roel_v · on March 16, 2016

"and I typically form negative opinions of the things I see in ads."

Sure, until it's something that fits your world view, and then you don't.

Look, I'm usually the first to (excessively) qualify sweeping generalizations I make with 'generally', 'statistically speaking', 'ceteris paribus' and other weasel words. Yet in the context of pervasive marketing, I don't need to - everybody is swayed by some form of advertising, whether you realize it or want to admit it or not; 50 years of research and billions of dollars spend on that research tell us so unequivocally. But if you don't, it's better you start admitting it to yourself - if only because it gives you better insight into your own internal psychological processes.

Then again who am I to give unsolicited paternalizing advice to strangers.

goldenkey · on March 15, 2016

So the ads have an impression on you... dun dun dun.

I saw a comment about the north korea pics earlier yesterday. It was referring to propaganda. Apparently the purpose of censored speech and propaganda wasnt so much to sway opinion and be cloaked as truth, rather it was an irresistable pleasure for rebels to decry and thereby reveal themselves. You might think that a deliberate attitude toward advertising can defeat it but the subconscious does not answer to your control :-)

deanCommie · on March 15, 2016

You are one person. Statistics show most people are not like you.

yread · on March 15, 2016

Yeah this stuff is getting annoying, my organization's firewall blocks ads as they consider them security risk so there's really nothing I can do about it. I guess I should just get back to work sigh

darkarmani · on March 15, 2016

I ended up refreshing, select all, copy and pasting it into my editor. That resulted in me spending no time on their page.

TimJRobinson · on March 17, 2016

Disappointing that you can't even view it with "Disconnect" enabled. I'm ok with seeing ads, but not ok with sites sending my data off to 15 different advertising companies every time I load a page.

According to disconnect that one page sends data to 15 different advertising sites, 15 different analytics services and many other content services. So much for fighting for their readers privacy online.

ec109685 · on March 15, 2016

How do you know you are never influenced by the ads you see?

Also, Google Contributor network let's you block any DoubleClick ad, while still paying the publisher for the page view, so that might get what you want.

driverdan · on March 15, 2016

Switch to uBlock Origin. I didn't see any such message.

hsod · on March 15, 2016

This was the first time I bumped into it as well. It was a bit jarring at first, but easy enough to whitelist. I think it's a reasonable request.

anp · on March 14, 2016

steveklabnik also pointed out on the Rust subreddit that there's a Dropbox blog post covering a few more technical items:

https://blogs.dropbox.com/tech/2016/03/magic-pocket-infrastr...

myth_drannon · on March 14, 2016

Now that Golang is making headways at Dropbox, I guess Python codebase will diminish in its importance... I wonder how Guido Van Rossum feels about it. First he was at Google but they created Go and he left to Dropbox which was a large Python shop, now they are moving to Go too.

james_cowling · on March 14, 2016

I probably wasn't sufficiently clear in my other message re Golang:

- Python is our primary development language for most stuff at Dropbox.

- Mobile development and UI code use whatever is appropriate for each platform.

- Go is the primary language for Infrastructure, meaning fairly deep-backend stuff: databases, storage systems, message pipelines, search indices etc.

- Rust is used on Magic Pocket (which is within Infrastructure).

- We still have a bit of C++ code floating around but there's not much new development in C++ within Infrastructure. C/C++ are still used for common libraries for mobile development however.

hakcermani · on March 14, 2016

Its an amazing remarkable feat indeed to move all those bits live to another location ! But seems contrary to what Netflix did (move everything to AWS ! - http://www.eweek.com/cloud/netflix-now-running-its-movies-ex...) I guess they have their specific usage and reasons ?

jamwt · on March 14, 2016

Both companies control the technology that most impacts their business.

Dropbox stores quite a bit more data than Netflix. Data storage is our business. Ergo, we control storage hardware.

On the other hand, Netflix pushes quite a bit more bandwidth than almost everyone, including us. Ergo, Netflix tightly manages their CDN. That's where they really focus their technical work. Their storage footprint is smaller and less important to optimize to the extreme.

turingbook · on March 15, 2016

I hated the writing style of Wired guys. The blog from Dropbox is much more clear: https://blogs.dropbox.com/tech/2016/03/magic-pocket-infrastr...

MichaelRenor · on March 15, 2016

The open-source alternatives to S3 have a long way to go, especially since AWS S3 is so simple and reliable. I ran Openstack swift in production once upon a time and I remember the entire system coming down because the the rsyslog hostname did not resolve. Ouch. I'll let somebody else work out those bugs.

steveklabnik · on March 14, 2016

A short mention of their use of Rust appears!

cj0 · on March 16, 2016

Some requests for the DropBox Diskotech tech blog series: For which datacenter cooling system are the Diskotech storage pods designed? Hot aisle cold aisle? How evenly can the disks inside the Diskotech be cooled? With this question I try to ask, how good is it in keeping all drives cool? For example for the theoretical case where all disks in the Diskotech are dissipating identical heat, how much variation is there in drive temperature at roughly 25ºC intake temperature? How much power is consumed for cooling (despite that it will be fractionally low compared to what the disks will consume)? Is there any drive idle optimization done or is the expectance that not any disk will reach its idle (non-rotating) state ever? Is there any optimization done inside a Diskotech to reduce the (inrush) current surge when connecting the machine to power?

Spooky23 · on March 14, 2016

I'm a recent Dropbox Pro subscriber -- I've been a user for many years.

Really glad to see what's been going on from behind the scenes, and I'm looking forward to seeing what the future will bring to the front end of Dropbox.

Thanks guys!

jamwt · on March 14, 2016

Awesome! Thanks for being a customer. We'll continue to work hard to make Dropbox better and better.

iskander · on March 14, 2016

Q: Did the Dropbox devs working on the second version of Magic Pocket encounter any language stability issues with Rust? Did updates to the core language ever break existing code?

jamwt · on March 14, 2016

We probably "pulled forward" every two months or so. Back in early 2015, sometimes we'd have breakages that took a bit of work to remedy. But post 1.0, the language has been very stable.

As usual, we actually have more challenges pulling the libraries forward than the language.

bane · on March 15, 2016

From a business sense, this is a great move. Amazon's cloud storage offerings are pretty expensive, even with various deduping strategies in place, Dropbox needed both store and move in and out massive amounts of traffic. If DB had reached the point where they could staff the extra over head of doing it themselves, and figure out how to spread the hardware cost out well, it will give them more pricing flexibility in the near future.

virtuallynathan · on March 15, 2016

It looks like the servers they make use of are purchasable, part of the Dell DSS series - the DSS7000: http://downloads.dell.com/manuals/common/dss%207000_dss%2075...

90x 3.5in disks, 2x compute nodes with 2x E5-2600v3 CPUs.

jamwt · on March 15, 2016

We actually use several different servers from different vendors, and some of them are more customized than others.

virtuallynathan · on March 15, 2016

Ah, yes I spot the 78 bay Quanta one: http://www.qct.io/Product/Storage/Storage-Server/4U/New-Quan...

Can't identify the 3rd one yet...

EDIT: Found it... HPE Cloudline CL5200, 80 drive bays. http://www.nextplatform.com/wp-content/uploads/2016/03/hpe-c...

the_watcher · on March 14, 2016

I worked at a company that moved off AWS and the changing a tire on a moving car is exactly how the devops team described it.

JackPoach · on March 15, 2016

I wonder how capital intensive the move is. Isn't Dropbox overall still unprofitable as a company?

bluedino · on March 14, 2016

>> Transferring four petabytes of data, it turned out, took about a day.

46GB/s, is my math right?

james_cowling · on March 14, 2016

We were pushing over a terabit of data transfer at peak, so 4-5PB per day.

atYevP · on March 14, 2016

Yev from Backblaze here -> we'd be curious to hear how! A lot of folks are looking for quick migration assistants for our B2 service, that's pretty fast!

james_cowling · on March 14, 2016

We have a big fleet of what we call "backfill nodes" which handle the transfer but the most difficult component of the transfer is just the network capacity. We worked pretty closely with Amazon here since we have a long-standing relationship and a lot of data transfer between the two companies.

nilayp · on March 14, 2016

Any plans on blogging about this and open sourcing your tools? From our vantage point, this would be hugely popular.

jamwt · on March 14, 2016

We might blog about it, these particular tools probably aren't very amenable to open sourcing. They've heavily dependent on in-house Dropbox infrastructure.

james_cowling · on March 14, 2016

Btw. that was a typo, I meant over "half" a terabit of data transfer. You get into bigger numbers than this when you start talking about aggregate traffic.

virtuallynathan · on March 14, 2016

i.e. 1Tbps? Jeez! That is crazy.

josh2600 · on March 15, 2016

If you are moving bits from $External_DC to AWS, your best bet is to colo inside Equinix (or equivalent) and use their cross-connect to AWS service. If you have 100 fiber interconnects between you and AWS, you can move a lot of data (it becomes an interesting programming problem to figure out how to saturate that much fiber consistently).

In terms of your customers getting data to you, again, fiber interconnects inside the DC are your best bet. Colo'ing is the fastest way to move bits because it relies on arteries, not capillaries, for moving bits from place to place (no last mile problem).

jamwt · on March 15, 2016

Yes, balancing links and distributing work w/out congestion collapse became a really interesting and nontrivial problem.

josh2600 · on March 15, 2016

I can't even imagine.

Did you guys get into custom network hardware or still a Juniper/Cisco shop?

scurvy · on March 15, 2016

Juniper QFX from what I've heard.

photonwins · on March 14, 2016

I am curious to know, with so many disks densely packed in a 4U configuration, I am guessing there is definitely increased heat generation and not to mention vibration. How do you handle these? Also, does it have any effect on MTBF?

jamwt · on March 14, 2016

Yeah, our hardware teams put a lot of qualification time into hardware profiles before we buy in bulk. And part of what we test is power utilization, heat, etc, under normal software loads vs extreme loads, to make sure we don't exceed power budget, don't spin everything up at the same time, etc. This process can take months.

A lot of the system design is about talking closely to vendors about MTBF for specific loads, and we use a lot of metric data to characterize our loads accurately in those calculations. Then, of course, we measure once we're in the field to make sure we were on target, so the next round of estimations is better.

scurvy · on March 15, 2016

It is indeed scary to watch a storage node pull 1.5-2 kW at power-on. I think my eyes popped out the first time I watched the PDU load dance around like that.

dmitrifedorov · on March 15, 2016

I'm done with Wired: cancelled its subscription a couple months ago (after many years), now they demand I login to read their web site because I use ad blockers. Done!

jessegreathouse · on March 15, 2016

Is there a link without the anti-adblock crap?

bartvk · on March 15, 2016

You can also save it with Pocket http://www.getpocket.com/

ldom66 · on March 15, 2016

Unrelated to the article but these portraits are really beautiful! Props to the photograph and studio.

ccannon · on March 15, 2016

I applaud your efforts at answering all these questions and providing very detailed responses.

wchrisn · on March 21, 2016

Can you share your development practices in rust like TDD, Automated Builds etc

max_ · on March 15, 2016

whats the "brand new programming language" the writer meant ?

yberreby · on March 15, 2016

It is Rust[1]:

> So, in the middle of this two-and-half-year project, they switched to Rust on the Diskotech machines. And that’s what Dropbox is now pushing into its data centers.

See https://www.reddit.com/r/rust/comments/4adabk/the_epic_story... for more information.

[1]: https://www.rust-lang.org/

monomaniar · on March 28, 2016

Awosome

ignoramous · on March 14, 2016

Key points:

1. Dropbox moved from AWS to its own datacenters after 8 months of rigourous testing. They didn't exactly build a S3 clone, but something tailored to their needs, they named it Magic Pocket.

2. Dropbox still uses AWS for its European customers.

3. Dropbox hired a bunch of engineers from Facebook to build its own hardware heavily customised for data-storage and IOPS (naturally) viz. Diskotech. Some 8 Diskotech servers can store everything that humanity has ever written down.

4. Dropbox rewrote Magic Pocket in Golang, and then rewrote it again in Rust, to fit on their custom built machines.

5. No word on perf improvements, cost savings, stability, total number of servers, amount of data stored, or how the data was moved. (Edit: Dropbox has a blog post up: https://blogs.dropbox.com/tech/2016/03/magic-pocket-infrastr... )

6. Reminds people of Zynga... They did the same, and when the business plummeted, they went back to AWS.

7. Not a political move (in response to AWS' WorkDocs or CloudDrive), but purely an engineering one: Google and then Facebook succeeded by building their own data centers.

jamwt · on March 14, 2016

> Dropbox rewrote Magic Pocket in Golang, and then rewrote it again in Rust, to fit on their custom built machines.

Actually, full disclosure, we really just rewrote a couple of components in Rust. Most of Magic Pocket (the distributed storage system) is still written in golang.

> No word on perf improvements, cost savings, stability, total number of servers, amount of data stored, or how the data was moved.

Performance is 3-5x better at tail latencies. Cost savings is.. dramatic. I can't be more specific there. Stability? S3 is very reliable, Magic Pocket is very reliable. I don't know if we can claim to have exceeded anything there yet, just because the project is so young, and S3s track record is long. But so far so good. Size? Exabytes of raw storage. Migration? Moving the data online was very tricky! Maybe we'll write a tech blog post at some point in the future about the migration.

james_cowling · on March 14, 2016

Yup, Dropbox Infra is mostly a Go shop. It's our primary development language and we don't plan to switch off any time soon.

We're always about the right tool for the job tho, and there are definitely use cases where Rust makes a lot of sense. We've been really happy with it so far.

SeanDav · on March 14, 2016

Without getting into any sort of religious war, what are typical use cases that make Rust a better choice?

james_cowling · on March 14, 2016

I think @jamwt did a pretty good job of explaining this in https://news.ycombinator.com/item?id=11283688.

At a high level it's really a memory thing. There are a lot of great things about Rust but we're also mostly happy with Go's performance. Rust gave us some really big reductions in memory consumption (-> cheaper hardware) and was a good fit for our storage nodes where we're right down in the stack talking to the hardware.

Most of the storage system is written in Go and the only two components currently implemented in Rust are the code that runs on the storage boxes (we call this the OSD - Object Storage Device) and the "volume manager" processes which are the daemons that handle erasure coding for us and bulk data transfers. These are big components tho.

moomin · on March 14, 2016

The really big deal is memory management. There's no GC, and there's pretty precise memory control, and it's not the Shub-Niggurath that dealing with C++ is.

In fact, just think of Rust as C++ for mortals. :)

rbranson · on March 14, 2016

Curious about the specifics with how you interfaced Go with Rust.

jamwt · on March 14, 2016

Right now, they talk over RPC. In the future, we might embed Rust libraries as C libraries into go and other languages.

mandeepj · on March 14, 2016

Was not earlier it was Python? I guess Dropbox even hired Guido van Rossum due to this.

chirau · on March 14, 2016

What happened to Python?

jamwt · on March 14, 2016

We still have a ton of Python. James is referring to the infrastructure software layer--storage services, monitoring, etc. That particular part of Dropbox is mostly Go.

But our web controller code, for example, is millions of lines of Python. And we use Python in lots of other places as well.

illumin8 · on March 14, 2016

I had to count carefully, but it appears you're claiming 12 9's of durability for magic pocket, which seems like a subtle jab at S3's 11 9's.

I think both durability numbers would be fine for customers, but I've also wondered about the math behind AWS S3 durability for a while, and how you would prove statistically that 11 9's was even possible.

toomuchtodo · on March 14, 2016

I'd imagine that if Backblaze is realizing great cost savings building their own storage pods from commodity off the shelf components, Dropbox's costs we're even lower per GB. Well done.

atYevP · on March 14, 2016

Yev here -> One of the reasons we started opening up about our Backblaze Pods and Hard Drive stats was so others would do so. We approve this message :D

james_cowling · on March 14, 2016

This is most accurate. A couple of comments:

> Dropbox still uses AWS for its European customers.

We haven't publicly launched EU storage yet but will be doing so later in the year.

> Dropbox hired a bunch of engineers from Facebook to build its own hardware heavily customised for data-storage and IOPS (naturally)

Facebook and Google and startup folks and people from random other places.

Our IOPS demands are reasonably modest on a per-disk basis which is why we skew heavily towards storage density. This is a different workload than you'd find at some of the other big players, hence a fairly custom solution.

> Not a political move

Definitely. AWS are great, we love working with them and will continue to do so.

discodave · on March 14, 2016

How much of your cost savings was due to the lower IOPS requirement?

Also, was the S3 "infrequent access" tier a response to customers like you or was your special bulk pricing already taking into account your low IOPS demands?

Thanks!

honeyman · on March 14, 2016

Thanks a lot for summarizing it for those who could not read the article due to the Wired adblock-preventing nag screen!

remedialhedge · on March 14, 2016

At the end of the blog post, they mention that they're working on creating the ability for European companies to store data in Germany if requested. This is interesting as the implication is that European businesses don't see the US as a safe place to store their data anymore.

gfosco · on March 14, 2016

It's not about safety, it's about EU privacy laws and Safe Harbor. It is desirable to store the data in-country to avoid the hassle of requiring/finding a provider with an EU approved Data Protection Agreement.

mathattack · on March 14, 2016

It would seem to me that if you're primarily in the business of storage, you should go down the stack as far as you can. Outsourcing to AWS leaves you at parity to anyone else who can do the same.

If Box is in the "Business Services" business, then perhaps it's different.