Hacker News new | past | comments | ask | show | jobs | submit login
DigitalOcean leaks customer data between VMs (github.com/fog)
395 points by sneak on Dec 30, 2013 | hide | past | favorite | 200 comments



TL;DR: In the DigitalOcean web panel you can check the "scrub data" checkbox when destroying a VM. When using the API this option is not ticked. This can lead to other customers being able to retrieve your data.

The author thinks that this is a security issue because this option should be enabled by default. However, (I assume) it's not in Digital Oceans interest to do full disk scrub because it reduces the lifespan of their SSD.

If a user forgets to log out of Facebook on a public computer, is it Facebook's responsibility? Similarly, if a user does not correctly delete data on a budget host, is it the hosts fault?


The pro-DO bias of HN is showing again. If this was Amazaon or Linode, there would be endless hand-wringing and calls for lawsuits. When its DO doing something incredibly stupid and short-sighted, its 'blame the customer.' This is penny pinching at the cost of security, which is something we often complain about when it comes to other companies. Why does DO get a free pass?

"Oh, there's an option for that" to stop default dangerous behavior should not be excusable. This is further proof on why DO isn't enterprise ready and still a toy for barely stable dotcoms.

Thank god for EU privacy laws. If these guys have any EU presence, they'll be forced to clean up their act via regulation. Clearly, the invisible hand of the market and "there's a option for that" is the fail its always been in regards to security.


I was with you until you started talking about the "invisible hand of the free market" failing and how instead we should be using State force to solve these problems. Isn't your comment a free market action, raising awareness about this problem? Won't this thread cause DO to take action? Say it doesn't. There are plenty of other hosting companies to switch to. And also there are times when I might be willing to trade security for convenience, price, practicality, etc. As a consumer, I want this choice. I think the free market is working fine. You can't bully your way to good security.


Glad somebody pointed this out. It's also hilarious to me that we were all screaming for heads with the GD (mt) acquisition a few months back, and this is just a "pat on the back" type of thing.

Their growth and offerings are great, but they were going to run into problems sooner or later. As we've seen with many other hosts in the past, this is probably just the beginning.


They have a datacenter in Amsterdam


Wait a second. Why does this option exist at all? Are you saying there are people who DO NOT WANT their data destroyed when they destroy a VM? Who are these people?


If you care about your data then the erase will not protect you enough anyway, if you think it does then you've probably misunderstood the underlying architecture a little.

The data for your virtual machine's virtual disk(s) will not always be in the same place, it may get moved around the storage volume or moved between storage volumes and this will be completely transparent to you. It may also get stored on backup volumes too. When you "secure wipe" the disk as you kill the VM all you are doing it wiping the data that is in the current location not any latent copies that may be sat elsewhere (as backups or "ghost" data sat in currently unallocated bits of physical media).

The only way to grantee your data is stored securely and is gone securely when you want it gone, is to use full-volume encryption from the start and make sure that the key's are never stored at the provider side (this does mean that you need a mechanism by which you can provide the key(s) to the VM whenever it reboots for some reason). That way there is no need to wipe the current store or worry about ghost data elsewhere: just destroy all copies of the keys for those volumes. Of course there is still some risk as the keys need to be in RAM somewhere so the encrypted volumes can be accessed at all, but once you get to that level of concern you can only be sure you are secure by having your own physical kit.

Of course not all current hardware sharing solutions can support full volume encryption as you don't really have a proper volume(s) that you can encrypt and put filesystems in, just a part of a larger filesystem pretending to be one (and you can't mount things so you can't use a file-based volume instead)...

tl;dr: wiping the VM's disks does NOT protect you from this sort of thing, using properly implemented full volume encryption will (as much as is possible in a shared virtualisation host, iff your hosts's solution can support FVE in the guest VMs/containers at all).


You make it sound like there's no valid point of security in between "anti-the-next-random-guy" and "anti-sophisticated-governments." There are a lot of people who care about their data enough that it shouldn't be left lying around for passersby, but not so much that they're going to implement FDE. I'd call them "normal people."


The suggestion of using TRIM does not (in all cases) protect against accidental data visibility to the "next random guy", it doesn't even protect from some other current random guy as your container could have moved and left another user with access as their new container got created over the space you used to occupy - so unless you have another method that hasn't been discussed yet there isn't a point in between: you protect yourself from both or you protect yourself from neither.

If you need to be sure your instructions to wipe data do protect you from that random other guy then you either need to use good encryption (it doesn't have to be full volume encryption but once you get to the point of caring it is probably easier to go the whole hog than do it piecemeal) or you need to have dedicated physical storage (something a VPS provider generally doesn't offer.

Of course the provider could instigate full data wiping for all relevant operations, but that imposes I/O load that will affect all other users of the given host machine(s) unnecessarily (I say "unnecessarily" because most won't care as their bulk data is not sensitive and they've invalidated associated things like SSH keys anyway, the host will take the attitude that if your data is sensitive you need to take measures to protect it).


It's probably there for DO's sake. SSDs have limited rewrite cycles, and each secure wipe will definitely shorten the lifespan of the drive much more than a simple quick wipe. This translates into cost savings.


Yes. This is a shitty default peddled as a benefit for the customer, when in fact it benefits DO.


No other provider charges you extra to erase your data when tearing down an instance.

It's called a dark pattern.


Okay, the misinformation is getting a little out of hand here. DigitalOcean certainly aren't charging you to erase your data, all you have to do is check the "Scrub Data" checkbox. Yes, you have to pay for the time your VPS is on while erasing, but that's perfectly reasonable.


Why would that be perfectly reasonable? They have an obligation to not make my data available to other people, I don't give a rats ass about the technical details of doing that.


Most other provider use hard drives and they're different from SSD.


"when in fact it benefits DO"

I could argue that it benefits the customer as well.

If it costs DO more (because of disk wearout) then in theory they would have to raise prices in order pay for their higher costs. And those higher costs could then disrupt the balance in pricing that makes them attractive.

My point is that forgetting the numbers you can't say it only benefits them. Being able to offer lower prices (because your expenses are lower) also benefits the customer as well.


If they can't afford to use SSDs properly they should not use SSDs.


Worst cost-saving measure ever.


It's much faster and there are certainly people who don't care because the data is worthless to anyone but them.


in other words, the only people who are affected are those who don't know the importance of scrubbing their data?


Come on people! DO is a developers playground ($5/month!!!!) not a place to host sensitive production data.


Sensitive is a relative term. You do development, you install a private key on the VM, you grant that key access to your system in order to test new integration prototype, you delete the VM before you disable the key - and voila, you got private key with access to your systems in a hands of a random stranger. Of course, the solution would be to disable the key access first - but how many devs are trained in security procedures to this level? How many can follow them flawlessly over repeated tedious cycles of development and resist the temptation of make shortcuts?

Proper procedures should make it easy for people to do security, otherwise they are part of the problem.


"Proper procedures should make it easy for people to do security, otherwise they are part of the problem."

Correct. And leaving the proper procedure in other people's hands is the problem. The best procedure [in testing a new intergration prototype] would be to create a new special test key and properly track and maintain that it is only enabled during the known testing time frame and then properly "disposing"/"revoking" it in the system YOU control. We can argue all day that DO should scrub the data by default and prevent new instances from low level access... however security relies on trust. So, do you TRUST (and KNOW) that checking the "scrub data" check box actually works and does what it says it does? If it doesn't, its beyond your control. And if you care about proper procedures and security you either a) don't use systems/servers beyond your control (if you are that concerned...) and/or b) rely on proper procedures and things you can and do control. I don't know what happens in DOs infrastructure... maybe there is some other leak I'm unaware of so to limit any damage for testing I will properly create and control short lived test keys.


I use DO for bootstrapping and trying out new ideas, programs & co. There is no information I can think about I would wish to get wiped off of the SSD.


Source code? SSL private keys? Private keys used for deploying code or configs from Github? New Relic license codes? AWS access keys? Pcap dumps? Database passwords? Deleted blog posts? irssi cybersex paged out to a swapfile?

Seriously, if there's nothing on your cheap sandbox server that you don't want published, you're probably not using it in the first place.


Reading your comments in the link, and comments in this thread, I find you to be alarmist at best.

Yes, this issue needs to be raised. No, you can't say that anyone who doesn't agree is a non-factor.

You have the option to scrub your data anytime you like, I've always understood that wasn't the default on DO.


Then certainly you don't mind giving me your root login?


To a server you have discarded?


If it was a lifespan issue they could just encrypt the partitions (minimal overhead with AES) and then destroy the key with the droplet.

Due to the way that SSDs wear level, even if every block returns 0, you don't know if that's the same block or if the real data is squirrelled away on out-of-life blocks. From what I understand the SSD has a lot more capacity than advertised and moves the blocks forward as sections of the chip wear out. Assuming you could reprogram the controller (or use a different one), you could go back and read the old blocks and recover data in the clear.


Not several times more capacity, it's generally about 7% (that's why you often see 120 GB or 240 GB drives that are in reality 128 GB or 256 GB.)


You're right. Encryption is the right answer. What does Amazon do with their SSDs? Anyone know?


I would expect any sane large provider to zero the disks simply to make it easier to overcommit on the storage layer.


The problem with encryption at that level is that trim no longer works as there are no longer any zeroed blocks - so writes are slower. Also there will be more write applification as every write will cause a block to be erased.


Of course it works. The filesystem knows which blocks the underlying storage device can forget and tell it using TRIM. That the content of the block was encrypted is irrelevant. TRIM is for putting blocks in some undefined state when no longer needed, not for zeroing them.

See http://worldsmostsecret.blogspot.com/2012/04/how-to-activate...


Oh it looks like I'm a bit out of date, it took years but I can see they now support this. Of course this comes at the cost of being less secure (giving away the used block locations), though it shouldn't matter for this use case.


I'm surprised I didn't see anyone raise the issue of Data Protection given DO's European presence. This comment below [1] appears to confirm that customer data is mishandled in violation of EU Data Protection law. When you ask 'is it the hosts fault?' I think the answer is a most definite YES.

Surely some lawyer out there took a look at this, so maybe I have missed something, but this looks like a big problem to me.

From my own experience using DO I can say I'm a happy customer and I plan to keep using them. I tick that box when I've used anything remotely sensitive in a VM when destroying it, and leave it empty other times (like when I've made a mess experimenting with something and want to quickly trash & re-create a droplet).

[1] https://news.ycombinator.com/item?id=6983260


DO is a data processor under EU data protection law, while the customer would be the data controller. EU data protection law currently (it will change with the new regulations) only imposes legal duties on the data controller. As such, it is the customer's legal problem if it (or its data processor) has failed to handle personal data correctly.


> Similarly, if a user does not correctly delete data on a budget host, is it the hosts fault?

What does the verb "destroy" mean?

Screenshot taken today (30DEC13) from a fresh blank Digital Ocean server I just provisioned:

http://i.imgur.com/fJOxRN9.png


What are we looking at in your screenshot? I've been Where's-Waldoing it for the word 'destroy' but can't find it.


That's someone else's data (three someone elses' if you count the iPhone end users) read from the root block device on a minutes-old brand new fresh Digital Ocean VM that I got from them for a $5 PayPal payment. It had been mkfs'd but not zeroed.

Command was:

    apt-get -y install binutils ; dd if=/dev/vda bs=1M | strings -n 100 | grep 2013-12
The destroy api call docs are here:

https://developers.digitalocean.com

(It's the /droplets/[droplet_id]/destroy one.)


I think he's showing that the data is not "destroyed" in the sense it still exists. Destroy in a virt context doesn't necessarily mean "destroy all the resources associated with a VM"--I don't know about DO's product offering, but at the hypervisor level, at least with Xen and libvirt, you often want to "destroy" the instance (forcibly terminate/undefine from the hypervisor) and leave the resources (storage pools/devices, IP pools/addresses, network flows/filters etc). I think focusing on the word "destroy" is a bit of a canard; the real problem is insecure defaults wrt block device scrubbing when you issue an API "destroy" (which wouldn't be any better if it was called "delete" or "undefine").


nice post - also #fmc says hi ;)


The term "disc scrub" makes little to no sense in terms of SSDs, which makes this issue even more complicated. People who work in the disc recovery field have been dealing with this since SSDs became popular.

An SSD is limited by its number of writes. To compensate for this, the SSD has very complicated on board logic that abstracts the actual SSD away from what it tells the OS system. This allows it to do certain tricks to save writes. However, when you are "scrubbing" an SSD, internally the SSD might be writing somewhere else entirely. Scrubbing is not considered an effective way of wiping SSDs, from what I believe.


There is a vast difference between writing out zeroes to the SSD but still having some of the original data potentially persist on the SSD but unreachable without special techniques, and not zeroing out the SSD and giving the device to a new VM and letting it trivially access everything that was previously there.

If I can provision a new VM and cat /dev/vda and see data from the VM that previously occupied that spot, then you are doing it horribly, horribly, horribly wrong.

That zeroing out the data leaves open a different and vastly more difficult attack path doesn't make that any less true.


Ok, so the data isn't necessarily destroyed immediately after a scrub. But how does that play out at the level of VM users? Is there a normal usage scenario where the portion of the SSD containing my deleted data is made readable to someone else's VM, or will it be inaccessible to normal VM users until it's overwritten?


I'm not an expert on this, but I believe that at the VM user level they would see wiped data because of the internal mapping. I think physical analysis of the drive would be required.


They don't have to do a full disk scrub on the physical SSD to fix this security problem. All they have to do is what every sane sparse disk image implementation in the history of mankind has done: catch reads from unwritten blocks in software and return a block of all zeros.


I think it's less in Digital Ocean's interest to silently give customer data to a subsequent user of that block device?


Why?

If customers don't care enough about the data why should DO?

DO is a budget provider. Their main hook was cheap SSDs, and is still cheap costs. They billable time it takes to wipe a VM adds to the cost. If you want the data zeroed... then pay for it, otherwise it must not be important enough.


DO is essentially saying they're using insecure defaults by design, which is a Really Bad Idea, even if it prima facie saves time/money. It's relatively easy for them to post a point in a FAQ about why you're billed for x minutes after an instance is destroyed, and include the request params/headers to avoid it. On the other hand, it's effectively impossible to unleak an inadvertently leaked apple developer signing key because an unwitting customer didn't rtfm.


There are a couple of things I wanted to say, and I can speak with some authority on the subject as I speak on behalf of DigitalOcean.

This was mentioned to me on twitter hours ago, prior to this post. The first thing I said is that most people these days understand the importance of a responsible disclosure, and that we take all security issues very seriously. Not following responsible disclosure with a company such as DigitalOcean is extremely irresponsible and I would be amiss to point that if anyone did ever find a software vulnerability filing it and waiting 24 hours for the appropriate response is preferred. - https://www.digitalocean.com/security

As far as I can tell here, there is no unexpected behavior that isn't documented or stressed. In both our API documentation, and our control panel, we note that you must either pass a flag or check the box to security delete the data. As far as I can tell, the flag is currently functionally correctly. so..

Is the complaint that customer data is being leaked from VMs? That the flag being passed via our API/Dash isn't actually working? Or, that our policy on not doing a secure delete by default isn't something you agree with?

j.


Any company that has ever stored data on DigitalOcean now needs to operate under the assumption that other DigitalOcean customers have accessed it.

Even if every staff member believes they checked the "Scrub Data" checkbox or used the API flag when destroying droplets, human memory is unreliable and people make mistakes.

This is a very serious security issue and it's appalling that anyone is making excuses for it, and it's even more appalling that the company responds by blaming customers.

Customers should not be able to access other customers' data under any circumstances. It shouldn't even need to be stated that providing access to other customers' data should not be the default.


"Or, that our policy on not doing a secure delete by default isn't something you agree with?"

This one. You have chosen a default that fails deadly. It's like designing a car that explodes when you turn it off. Oh, there's a button over here you can push to disable the explosion feature. That doesn't really make it better.

You've created an option that can and will deeply screw many of your users. The mere existence of the option is not wrong by itself. But the fact that it can and will so easily screw so many people means that the option needs to have lots of flashing warning lights around it and it needs to be on by default.

I just checked out the "Destroy" tab for my droplet. There is absolutely no, none, zero indication that failing to check this box will allow the next person to occupy my spot to read all of my data. Here is the exact text:

"This is irreversible. We will destroy your droplet and all associated backups."

"Scrub Data - This will strictly write 0s to your prior partition to ensure that all data is completely erased. Estimated Destroy Time: 11 minutes 22 seconds"

I would expect "destroy your droplet" to mean that the data gets destroyed. I would expect the "scrub" option to be for paranoid people worried about the FBI seizing your equipment and using electron microscopes to extract residual data. At no point does anything in here give me any expectations that the default is "hand over all of the data currently on the VM to the next random stranger who walks in the door".

Do you really speak on behalf of DigitalOcean? If so, you need to get your head straight fast, because this is not even remotely acceptable. You cannot defend the current practice, because it is not defensible. If you don't understand why that is, you need to sit down and think about it until you do.

Right now, as a customer of yours, my thought is this: if you think this isn't important and doesn't need to be called out, what else have I missed? What other crazy data leaks do you allow by default with the defense that I could turn them off if I cared? I hope and assume the answer is "none", but now I'm rather worried.

I can kind of sort of understand how one might end up building a system like this, thinking that it was a good idea at the time. But I cannot understand at all how someone could possibly defend it once it's pointed out that it's terrible.


Hey Mike,

I'm going to give you a call later this afternoon but I wanted to clarify. First, Moisey and I worked on this this morning, so worth a read: https://digitalocean.com/blog/

Second, the way this was approached was super super confusing, originally as it was in 140 characters, on the twitters. Maybe my short comings for not totally understanding the situation before I spoke, but information was fairly fragmented.

First: Yes, I do speak on behalf of DigitalOcean.

My original understanding was that this issue was with the secure delete flag not working when being passed. This promoted me to request, and continue to request, that if it was the case, security@ was notified with an outline of what is going on per http://digitalocean.com/security/ - If it's an expected behaviour, while still not good at all, it isn't my call nor was I prepared to call the company into the office at midnight on a Sunday knowing we would issue a software update in the morning. Had the flag not been respected, I would have immediately called the senior engineering team as well as Ben and Moisey, so fully understanding the situation was very important.

As a customer, I'd like you to know we do take security very very seriously, it's something we discuss going into everything, as we appreciate healthy conversations about the way our product works.

I spend 4 hours last night trying to figure out exactly what was going on, it felt very difficult to get a straight answer of "your policy fucking blows and you better change it tomorrow" - That's something I can take to the bank, but I'm sorry if I wasn't clear.

j.


Thanks. I think your initial reply here was way off base, but I see that you guys have done the right thing now, and that's commendable. There's nothing more I could want that doesn't require time travel.

Regarding the call, just e-mail if you want to get in touch directly for whatever reason. But I think we're square.


Appreciate it.


>The first thing I said is that most people these days understand the importance of a responsible disclosure, and that we take all security issues very seriously. Not following responsible disclosure with a company such as DigitalOcean is extremely irresponsible

Oh bullshit. Don't deflect the issue here by complaining that you don't like full disclosure policies that many security experts agree with. (Such as, I don't know, Bruce Schneier?) If you want to get into secondary levels of annoyance, how about the fact there have been multiple instances in the past with DO that were only resolved by open/full disclosure on forums?


As someone who isn't a DO customer, the thing that most dissuades me from this about becoming a customer is that this is a case of insecure by default. How many more of those are lurking around?


Here's another compound fuckup for you:

1. DigitalOcean users are unable to install their own kernel updates!

2. DigitalOcean have to bother making a new kernel image available via their admin interface; they haven't done this for over six months of Debian kernel updated in my experience.

3. Even if DigitalOcean did make a new kernel available, there's no notification to inform the customer that they have to log in to the admin interface and pick the new kernel from the list, then reboot their VM.

4. The list of kernels in the admin interface is sorted... bizarrely. I check it every so often and there is no sensible overall naming scheme; you are presented with a popup menu listing every single kernel for every single distribution; the latest kernel for Debian is in the middle of the list.

5. My attempt to resolve these issues with DigitalOcean support covinced me that the person I was corresponding with has no idea what a kernel even is, much less that DigitalOcean's list of available kernels is... lacking.

This situation, plus a couple the longstanding lack of progress towards IPv6 support; lack of ability to control kernel parameters; lack of a way to snapshot the filesystem for backups; makes me an unhappy DigitalOcean user who is going to jump ship for Bytemark at the earliest opportunity.


Funny, I've been able to update off the standard package repos


You may have updated the package, but did you really boot from it? Run 'uname -v' to check:

    $ uname -v
    #1 SMP Debian 3.2.46-1
DigitalOcean systems do not boot from the kernel image installed within your VM; they are externally provided.

This reminds me of something I omitted from my original rant. I've actually had to pin the kernel image package that I've got installed on my VM to the version that DigitalOcean provide:

    linux-image-3.2.0-4-686-pae:
      Installed: 3.2.46-1
      Candidate: 3.2.51-1
      Version table:
         3.2.51-1 0
            550 http://http.debian.net/debian/ wheezy/main i386 Packages
         3.2.46-1+deb7u1 0
            550 http://security.debian.org/ wheezy/updates/main i386 Packages
     *** 3.2.46-1 0
            100 /var/lib/dpkg/status
Because an unforseen ABI break in some netfilter module means that if I install the newest package, then reboot, one of the modules used by my iptables setup fails to load. ferm notices this and rolls back my firewall configuration--to the default state which allows all traffic. I noticed this, but I wonder how many other customers with similar setups did not, and hence have not noticed that their iptables rules are incorrect or absent.


Oh FFS. You can't whine about "responsible disclosure" while saying "As far as I can tell here, there is no unexpected behavior that isn't documented or stressed."

If it's documented, you've already disclosed it yourselves.


So you're saying, in the same response that 1) there was no "responsible disclosure" and 2) that this wasn't actually a security issue. You can't have it both ways.


"I never borrowed your saw, and anyway it was broken when you lent it to me."


As a DO customer I find this completely unexpected and unacceptable. Secure delete should be always done before transferring to another customer, if this takes time it should scrubbed as queued job, before the next customer gets it.


Agreed (also as a DO customer), DO please follow the instructions of Aaron Friel: https://news.ycombinator.com/item?id=6983270

It's a win-win :)


> Or, that our policy on not doing a secure delete by default isn't something you agree with?

This one. Choosing insecure defaults for a virtualization API is a Bad Idea. As a rule of thumb (to put it bluntly), people are dumb. If you give them a loaded gun, they will shoot themselves with it. And they will blame you for it. At least put the safety on and make them take a conscious step before blowing their face off. Don't mean to tell you your business, but seriously, insecure defaults are a Bad Idea for a virt API.


While you're totally right, this doesn't even come down to "people are dumb". The documentation is simply lacking here. At no point that I can find do they discuss the ramifications of not using the scrub option, and even a smart person could reasonably expect that not using that option still doesn't leak your data, just in some other way (one less safe against certain attackers, presumably).


It doesn't seem like a security issue by DO in any way. It seems that the fog api (the project this 'issue' is filled for) doesn't allow a user to access the required flag to scrub the drive. Not a DO problem in any way I can tell assuming the scrub parameter is working correctly.


This is a well-documented situation that almost every provider of 'consumable infrastructure' before DigitalOcean came along has faced and solved.

It is disheartening to see the same mistakes being made.

Whilst I absolutely see their USP was always solid-state storage, and that has pitfalls in terms of how you can scrub data to avoid it being leaked, the platform should take every precaution to protect customers data.

There shouldn't be an option to 'scrub data' and it shouldn't be defaulted to off so they can save some hassle, and avoid spending a few dollars. It shouldn't be an option because it should be on all the time, anything else is surprising the customer mightily.

"What do you mean, my data leaked? Oh that's fine" -Nobody

"Why'd you pick a provider who doesn't take our companies information security seriously?" -Every boss anywhere


It is absolutely a DO problem in two ways: 1) they default to bad behavior and 2) their UI does not make it at all clear what the consequences of that bad default actually are.


I have to agree with the other posters, this response is disappointing. The initial concern the other posters have is obviously the lack of a secure default.

Now i've known about this for quite some time and I mostly use DO for development and testing Chef recipes so I don't have an issue with it being off by default. I love the transparent $5 pricing. But I also don't store sensitive information on DO, yet. I also assumed when I was a newbie any sensitive info would be deleted.

The easiest option would be to include a notice next to that checkbox that if there was any sensitive information you should select this option. I understand that SSD wiping would drive up your costs. Another option would be to switch to mechanical hard drives. I don't how easy that option is with your setup.


DO please follow the instructions of Aaron Friel: https://news.ycombinator.com/item?id=6983270

It's a win-win :)


Your attitude and response here is exactly why disclosing this was in fact "responsible".


I asked three questions so I could address the title of the tread, that our users data is being leaked.


You either know or should know the answers to the questions you've asked in plainly bad faith, as the problem has been clearly described both here and in the linked github issue, which isn't even filed with DO because DO has already made it crystal clear you don't care about having secure defaults in this matter.

The issue was instead filed against fog, so that users of that library may be protected to the extent possible under the circumstances.

In other words: This isn't your issue anymore. You've already publicly dismissed it. Worse, you've gone back on an earlier promise about it.

It is now in the hands of the community to try and protect your customers, since you have refused to.


To be perfectly frank, I think the title of this post is totally disingenuous and started the wrong conversation, and that makes me sad because I know we do care about our customers, and their security. Had the title said "DigitalOcean doesn't care about it's customers security" I'd have been happy, because that would start a conversation we should probably be having about how data is deleted.

As it seems like this is actually an issue with people not liking how our product works, I've begun the internal conversation going about two things:

First, communicating again to our customers by way of a blog post that this is how the production functions, as well as highlighting any relevant tutorials.

Second, working with the product team and engineers to either reverse this functionality at best, at minimum draw greater attention to it.

j.


Not EVER presenting one customer's data to another customer is a basic part of any business involving multiple customers. In our case we store the UserId in every database table along with the other data, and validate that against the actual logged in user before returning it. It's why I was so horrified at a bug in Cyrus IMAPd replication which occasionally overwrote files belonging to other users.

And it's why you are wrong. Giving the same data back to the same user (holding their blocks) would be fine - but allowing customer data to be read by other customers, for any reason, is bad practice in any area of business.

In many sane jurisdictions, the practice when selling something is to factor the cost of eventual disposal or recycling into the initial purchase cost. This is required by law, for example deposits on drink bottles which can be redeemed by returning the bottle.

In your case, the honest thing to do would be to factor the eventual cleanup of data from the disk into the initial purchase cost of the service. So the cost to provision a VM would include the wipe cost.

Pointing out that you don't do that is a community service. Congratulations to the author of the post for noticing the issue and bringing it to everyone's attention. Now we can all make an informed decision.


In my opinion implementing the deletion of data in a way that can be forgiven by the user (and even worse API) is a bad idea. It's like burying unexpected clauses in the middle of a 50 page terms and agreements.

It's definitely not illegal or reprehensible, but anyone doing *aaS has exponentially more knowledge than it's users and knows it. Anyone who has once tried not to impose minimum password complexity knows what I mean.

To me it seems that data cleaning should be something you choose as you purchase the service. I understand that cost might be a factor in this particular case, but in then why not communicate about it and make a premium or discount system? It would probably come in good light to both users with sensitive data and those who don't care.


"at minimum draw greater attention to it." No that is not sufficient. You have to fix this, and you should have said so even if it was early on Sunday morning.


I presume the audience felt a Chief Technology Evangelist had more control over the tone and intent of his messages on the Internet, and therefore actually meant to couch his three questions within pointy remarks about responsible disclosure of a problem which was announced as fixed by DigitalOcean and reported as such in the press over 6 months ago.


This reminds me my own story: few weeks ago I was trying out their service and on newly created droplet I've noticed a... shell history of downloading and executing a shell script:

    1  clear
    2  ls
    3  clear
    4  wget https://kmlnsr.me/cleanimage.sh
    5  rm cleanimage.sh
    6  cd /tmp/
    7  wget https://kmlnsr.me/cleanimage.sh
    8  chmod +x cleanimage.sh
    9  ./cleanimage.sh
This looked very disturbing, so I went and check what that script is, and it is available to read for everyone, and seems to be a part of their provisioning procedure for the vm's, written by some guy who works for DigitalOcean as 'Community Organizer' (however, at that point I thought the website might be created by an attacker and misleading).

Not only it looks bad and alarming to customers, but also poses a security threat, where an attacker could target his website and/or server and replace the script with something nasty inside. How long before they'd notice such fact? No idea, but I've opened a ticket about it right on, giving them some advice on why its bad (availability, scaling, performance, security and PR reasons) but also how to better handle it, and it seems nothing has been done about it so far.

That rings a bell in my head not to use Digital Ocean service as things they do are looking pretty amateur.


To be honest, this looks like an artifact of the base image creation process.

"This file is used to clean up traces from DigitalOcean images before being published."

I don't think someone actually logged in and ran those commands on your instance. I could be wrong, but I'd bet this is just from sloppy creation of the base image leaving weird stuff in history after the image was published.


I have not said that someone does it manually, I only said, that on first look, that looks really suspicious. And, at the end, the support told me that:

"This is Kamal Nasser's script that has been set up to run on the images. The cleanimage.sh script sometimes doesn't clear the history. Thank you for bringing this to our attention. I've brought this to his attention.

There is nothing to worry about with this."

So in fact, it seems like it is being used, instead of being a leftover in shell history. In addition to that, I was later answered to the same ticket, by different support member that this script is not being used and just sits on that web page, but it all looks really bad in terms of professionalism.


Agreed. I wasn't defending DO's professionalism there, just saying that from my experience creating base images for a (different) cloud provider, it looks like an artifact of image creation rather than a real cause for alarm (I would certainly be alarmed if random people were logging in to my instance and running random scripts from the web!).


Yah, that's sloppy and won't happen again. Sorry you had that experience.


By the way: that script is meant to be sourced, not execed: the HISTFILE manipulation at the end only works if the script is sourced --- that's why you see this script in your shell history.


Yeah, I've seen this on my instance as well.

If the server was compromised... :\


I had this exact same thing.


Oh, hey guys, they've responded. It's no big deal, they just disabled the security because because _users were complaining_.

Turns out it "add[s] a very large time to delete events" when you actually delete things when a user makes an api call to DESTROY. Who knew?

http://i.imgur.com/MFW8ng6.png


I got the same response from them via Twitter. I don't think it's acceptable.

I worked for a hosting company that made data destruction their problem (i.e. we wiped the disks after the instance was terminated) because we didn't want new customers seeing non-zeroed disks and thinking we do not care about security.


Damn straight.

If I have something important on a VM it is purged before issuing a destroy. I will overwrite the block device myself so I know it is empty.

Why should I have to pay for DO to do the same again just because someone can't be bothered to read the manual.


As I replied to you elsewhere, I feel your comment here may lead people to believe that it would cost DigitalOcean money to correct this behavior.

This is wrong, it is costing DigitalOcean money not to fix it - in terms of SSD lifetime, not TRIMing those blocks increases fragmentation of the internal physical layout of the pages of flash memory. The behavior of DigitalOcean's virtual machines has surprisingly managed to achieved the worst possible outcome. Their hardware is being misused and their customer's data is being mishandled.


Well that's stupid. They should also take care of the case where I delete a vm. What do I care that it takes a long time for them to delete the data? That shouldn't cost me any money, as I have freed the resources from my account.


Interesting: this sounds like a recurrence of the same issue which was described a number of months back:

https://www.digitalocean.com/blog_posts/resolved-lvm-data-is...

At the time, the blog post claimed that the issue was resolved and that data was now being wiped by default. I wonder why that would have changed.


I'm the one who reported it. I was able to recover someone else's web logs from 29 December 2013 on a VM an hour old.


Talk about a link bait title. Its a bit hard to call it a leak,Its a configuration option that is well presented in the web UI. It is optional as it adds ~ 10 minutes of billing to the small 512mb VMs and as such it is optional if you do it.

If your using an overlay or API on top of a cloud or service, its the overlay's responsibility to ensure a consistency with your expectations. The API is consistent with the UI.

While other cloud providers accept the time that this takes as non-billable, DO don't. By getting higher utilization is how they are able to offer their prices and still have some modicum of service.


Let's be clear about what this is: they are charging their customers after their customers have deactivated a service (destroyed a VM) to not create the situation wherein they give that data to another customer later.

What sort of mental gymnastics are required to make that a reasonable choice?


>they are charging their customers after their customers have deactivated a service (destroyed a VM)

They are charging their customers for the number of minutes it takes to safely destroy the VM. This is not a charge for something coming 'after'. It's fundamentally a charge for their actual server use, not a bonus fee.

>What sort of mental gymnastics are required to make that a reasonable choice?

They aren't charging for security, they are giving you the option to buy less server time if you don't need security, or handle it yourself by wiping only the sensitive files. There are no mental gymnastics here.


I think you hit the nail on the head here: offering the option to buy less server time if you don't need to wipe data is probably reasonable.

Now, the problem here is that DO turned that choice around, and are therefore not providing security by default, but offering you the option to pay more to get it.

Additionally, this is poorly advertised (the API docs do not clearly state "Your data may be accessible by other users!"), and that explains why many customers are (reasonably) a bit pissed at DO.


Yeah, they screwed up the default via the API, but the choice is a reasonable one to have.


It takes 10 minutes to destroy a 512mb VM?


Looking at their pricing page, it looks like an instance with 512MB RAM comes with a 20GB disk. Depending on host load, IO and process niceness etc, I can see a `dd if=/dev/zero of=...` taking ~10 minutes easily.


If the hardware is spending its cycles on your workload, then it definitely makes it a reasonable choice. Its not like they can sell those cycles to someone else until your job is done.

Besides we are not talking about a high margin business here. $5 vms when most providers are charging 4x times that. Its not unreasonable to expect that your going to have to pay for extras. Similar to a budget airline, you get what you pay for. You want a service that includes that cost in your other fees... then use AWS, rackspace or one of the 1000s of others.


The basic offerings should be secure. You shouldn't have to know what all the bits and pieces of a custom interface mean before you start using a service in order to use it safely.

Seriously, there should not be an option "Shall we pass your latent information onto the next user?" left active by default. If people want to save that trivial amount of money, then let them turn off safety themselves.


If you care about your information I think you should also take responsibility for it. I can't see the point about blaming others for their defaults, it is made quite clear when you destroy a droplet.


Hrm, I should also add that for a $5/month VM, 10 minutes of time is worth $0.0012. And that 10 minutes doesn't require the RAM or CPU component, just the SSD, so it's much cheaper than that in actuality. It's silly to squabble over pricing that low. It would take a million destroyed VMs (at list price) for the cost to be much more than what's in the office's petty cash box, and it's worth it for the security implications, not to mention PR.


Please see my post debunking everything about DigitalOcean's need to spend even one minute scrubbing user data. The fact that they are using SSDs makes it extraordinarily cheap for them to scrub customer data using the TRIM command (on Linux: by sending the BLKDISCARD IO command). With that, they can logically zero hundreds of gigabytes of customer data within seconds.


I agree that there are ways to implement this. And perhaps they should (mind you from experience not all SSDs do this properly).

I simple can't blame them for delivering what they say they are going to give me, even if they could have built their infrastructure better.


Where do they say that they will give all of your data to the next customer to occupy your spot if you don't use this option?

I have looked at the UI and the API docs and it simply is not there. The scrub option says that it writes zeroes to your partition, but it says nothing about giving all your data away if you don't do that.


I think the issue of defaults is orthogonal to the issue of billing. For example, they could make scrubbing on by default (or even mandatory) and still bill customers for it. Of course they would have to disclose this.


I completely agree for the UI. For the API I think it is completely fine to have it not as default (In fact I would argue having a boolean controlling an optional action default to false for an API is the most correct action).


In this case one must balance the right API choice with the right security choice. Security wins every time, or at least it should, so the default should be wiping. If one would insist on having API booleans default to false, just change the input polarity (e.g. rename "wipe_disk" to "skip_wipe").


Fail safe. Fail safe. Fail safe.

It's fine to make this optional. But it needs to have large flashing red warning lights all around it and it needs to be off by default.


Yes I can't believe people are flipping out over a Fog issue. If you care about the privacy of your data, it is your responsibility to make sure it gets erased. If Fog wants to put that as a default, sure do it.


Actually thinking about this more, I am starting to understand the outrage. I think the commenters are right, it should be secure by default.


Anyone who's tempted to use "the cloud" for anything sensitive should first be forced to write out, at least 5000 times, in longhand:

"The cloud. Somebody else's computer".

I think cloud computing is great for the right applications, as long as people understand the risks.

But there will always be problems like this. Always. This is part of the hidden cost of "simple cloud hosting".


What about complicated cloud hosting? Rackspace is still "Somebody else's computer".


Since day one, Amazon EC2 used a copy on write system with their LVM volumes to protect against this problem (without them having to do expensive zeroing operations).

This has been an identified and solved problem for YEARS. No excuse for a modern VPS/IaaS provider to be leaking customer data in this way, except incompetence.


Amazon shifts the costs of zeroing on the user of the new VM. Because the blocks are zeroes on first access writes are twice as slow on the first pass. You can try that out easily by running `dd if=/dev/zero of=/mnt/test` twice.


Yeah, but that isn't a big deal especially when you consider the alternative: leaked data.


This is a huge problem and there seems to be a good deal of misinformation about this issue that has confused things. I'm going to debunk two things: first, that DigitalOcean is not violating user expectations (they are), and second, that doing this correctly is difficult (it isn't). The tl;dr is that if DigitalOcean is doing this, they are not using their hardware correctly.

First, it's not uncommon for virtual disk formats to be logically zeroed even when they are physically not. For example, when you create a sparse virtual disk and it appears to be XGB all zeroed and ready to use. Of course, it's not. And this doesn't just apply to virtual disks, such techniques are also used by operating systems when freeing pages of memory - when a page of memory is no longer being used, why zero it right away? Delaying activities until necessary is common and typically built in. Linux does this, Windows does it [http://stackoverflow.com/questions/18385556/does-windows-cle...], and even SSDs do it under the hood. For virtual hard disk technology, Hyper-V VHDs do it, VMWare VMDKs do it, sparse KVM disk image files do it. Zeroed data is the default, the expectation for most platforms. Protected, virtual memory based operating systems will never serve your process data from other processes even if they wait until the last possible moment. AWS will never serve you other customer's data, Azure won't, and none of the major hypervisors will default to it. The exception to this is when a whole disk or logical device is assigned to a VM, in which case it's usually used verbatim.

This brings me to the second issue. Because using a logical device may be what DigitalOcean is doing, it's been asked if it's hard for them to fix it. To answer that in a word: No. In a slightly longer word: BLKDISCARD. Or for Windows and Mac OS X users, TRIM. It takes seconds to execute TRIM commands on hundreds of gigabytes of data because, at a low level, the operating system is telling the SSD "everything between LBA X and LBA X+Y is garbage." Trimming even an SSD with a heavily fragmented filesystem takes only a matter of seconds because the commands to send to the firmware of the SSD are very simple, very low bandwidth. The SSD firmware then marks those pages as "free" and will typically defer zeroing them until use. Not only should DigitalOcean be doing this to protect customer data, but they should be doing it to ensure the longevity of their SSDs. Zeroing an SSD is a costly behavior that, if not detected by the firmware, will harm the longevity of the SSD by dirtying its internal pages and its page cache. Not to mention the performance impact for any other VMs that could be resident on the same hardware as the host has to send 10s of gigabytes of zeroes to the physical device.

Not only is DigitalOcean sacrificing the safety of user's data, but they're harming the longevity of their SSDs by failing to properly run TRIM commands to clean up after their users. It hurts their reputation to have blog posts like this go up, and it hurts their bottom line when they misuse their hardware.

Edit: As RWG points out, not all SSDs will read zeroes after a TRIM command, so other techniques may be necessary to ensure the safety of customer data.


In your second paragraph, you're conflating two different things. File-based disk images don't leak data when they're deleted because the filesystems those images live on ensure that (non-privileged) users can't get at data from deleted files. Sparse images can be smaller than the data they contain because...well, they're sparse images. They're files with holes in them, and the filesystem automagically turns those holes into zeroes on read.

Now, about Trim... Trim is only an advisory command. You tell the disk, "I'm not using these LBAs anymore, so feel free to do whatever with them." The disk has the option to completely ignore your Trim command, and even if it does mark those LBAs as unused in whatever LBA->NAND mapping table it uses internally, the disk can also continue returning the old data on reads of those LBAs if it wants to. There are disks that make the guarantee that Trim'd LBAs will always read back zeroes until written again (an ATA feature called Read Zero After Trim), but I'm guessing DigitalOcean isn't using SSDs that support RZAT since that's generally only found on more expensive SSDs, like Intel's DC S3700.

What I'm getting at is that Trim isn't guaranteed to do what you think it does. Unless the disk supports RZAT, the only way you can guarantee that the disk won't return old data in response to a read command is to write zeroes over that block.

If you're a VM provider and can't count on Trim doing what you want it to (reading back zeroes on Trim'd LBAs) because your drives don't support RZAT and you don't want zero out partitions at creation or destruction time, the right thing to do is encrypt every partition with its own randomly generated key at creation, then destroy the key when the partition is destroyed. Users will see random data soup on their shiny new block devices, which isn't as nice as seeing a zeroed out block device but is still nicer than seeing some other user's raw data. (Also note that doing this doesn't stop you from also issuing a Trim for a partition when destroying it so the SSD gains some breathing room.)


You're absolutely right and thank you for the clarification. I didn't intend to conflate sparse files with file-based disk images, but I was trying to convey that there can be a difference between the logical data of a disk image and the physical data, and that deferred zeroing is the default and the expectation of developers and sysadmins. Images can be sparse and/or file-based, as the features are orthogonal, if cross-cutting.

More importantly, you clarify that RZAT is a necessary feature for what I'm mentioning to work properly. You're right. They should both be ensuring the blocks served to customer VMs are zeroed on use and ensure that they are appropriately running TRIM commands to ensure maximum performance from their hardware. Not all SSDs perform RZAT, and it wouldn't be a bad idea for the host to ensure the device is logically zeroed for the VM anyway.

DigitalOcean could easily switch to doing both, or at least guaranteeing the former by creating new logical disks for customers as every other vendor does. If, as they have blogged about in the past, they are directly mapping virtualized disks to the host's LVM volumes, they are unnecessarily complicating their hosting set up and making their host configuration more brittle. With thin-provisioned/sparsely-allocated or with file-based virtual disk images, they can more flexibly deploy VMs with different disk sizes with minimal changes in host configuration.

Alternatively they could trivially ensure that even forensic tools would have a very difficult time erasing volumes by enabling dm-crypt on top of LVM, and resetting the key every time a virtual machine is deleted. This could reduce performance on some SSDs (particularly SandForce based models) but would allow minimal changes to their configuration to ensure deleted data is unrecoverable.


Using 1:1 mappings of LVM logical volumes to guest VM block devices is the most straightforward and performant method of doing it on Linux, short of doing 1:1 mappings of entire disks or disk partitions to guest VM block devices. While using file-based disk images would prevent data leaks between customers without any further effort required on the VM provider's part (assuming they don't reuse disk images between customers!), there are tons of downsides to file-based disk images, mostly related to performance and write amplification.

I don't agree that file-based disk images are more flexible than LVM's logical volumes — it's ridiculously easy to create, destroy, resize, and snapshot LVs.


Until very recently there were serious problems with putting LVM under any sort of concurrent load. Making more than a few snapshots at the same time, for instance, was asking for trouble. I say "was" - I've got no idea if these problems were fixed. You just don't have those problems with file-based images.


Yeah but you can't snapshot a file based image, so lvm without snapshots is just as good (and much faster).


ZFS and btrfs both let you do this, as do qemu COW images.


>Unless the disk supports RZAT, the only way you can guarantee that the disk won't return old data in response to a read command is to write zeroes over that block.

While this is true (the disk will never respond with old data once you have zeroed it out) it is important to remember that even zeroizing the disk yourself isn't a guarantee that the old data is actually gone from the disk itself - the disk may present itself as a raw block device, but internally it may use error correction, write amplification prevention, or error prevention schemes which may mean that old data will remain on the disk even though you have written zeroes over it. For example hard disks remapping bad sectors, or SSDs relocating chunks of data when the EEPROM gates in that chunk are starting to wear out. You would have to use forensic means to recover this information but it still remains. The only way to guarantee that the information cannot remain on the disk is to use encryption and make sure that the unencrypted key never touches the disk.


I don't think people care about the idea that a sufficiently advanced attacker with physical access the hardware can restore old data anywhere near as much as they care that the next customer along gets a fully readable copy of your data just handed to them.


Yeah if the attacker has physical access there are other, even more awful things he can do.


The attacker could get physical access at the end of the hardware's life when the hardware is decommissioned and either thrown out, recycled, or sold.


Industry standard is to shred then recycle.


For example, when you create a sparse virtual disk and it appears to be XGB all zeroed and ready to use. Of course, it's not.

From within the VM, all the VM will see is zeroes. It sounds like DO is giving VM instances direct access to the underlying SSD or something like that. In fact, I'm having a hard time figuring out precisely how this is occurring. Whenever you create a new VM, how can the VM possibly be reading data from the host's harddrive? Isn't that the definition of a security problem, since VMs are expected to be isolated?

I hope someone will explain the underlying technical details more deeply, because this is very interesting.


Please read to the end of my comment - it appears what DigitalOcean is doing is giving the VM access to a logical device that is preallocated. Perhaps carved out of LVM or MD or some other logical disk. KVM's default behavior when using these sorts of devices is to present to the VM whatever data already existed at the lower level.


Er, I fully read your comment when it was 7 minutes old, but it looks like you've edited it significantly since then to fill in some missing details. Thank you for explaining, I appreciate it!


Apologies then, glad I was able to answer your question with the subsequent edit. :)


TRIM at the point of killing the VM doesn't really help though - all you are doing is wiping the blocks where the data is currently stored. Your logical volume could have existed in many different physical places on those disks or on others as the host rearranges for any reason or swaps old hardware for new, so there could be "ghost" copies of your (slightly older) data all over the place, people could have had it mapped into their new volumes ages ago already.

The only way to ensure your data is secure is to use encryption to start with (preferably full-volume encryption, and make sure the keys are not stored at the providers end, so you'll need some mechanism for giving the VM the keys when it reboots and will have to trust no one can somehow read them from RAM) then you don't need to wipe the data at all: just destroy all copies of the keys and the data is rendered unreadable (to anyone given a new volume that spans physical media where your data once sat, it is indescribable from random noise).


note, sparse files have... performance issues. in the very best case you are going to end up with a lot of fragmentation where you aren't expecting it. I was on sparse files in 2005; I'm having a hard time finding my blog posts on the subject, but I didn't switch to pre-allocation 'cause I like paying for more disk, you know?

You are right about the zeroes, though; sparse-files solve that problem. and this is what I personally find interesting about this article. I would be very interested to find out what the Digital Ocean uses for storage. This does indicate to me that they are using something pre-allocated; I can't think of any storage technology that allows over-subscription that would not also give you zeroes in your 'free' (un-allocated) space.

>For virtual hard disk technology, Hyper-V VHDs do it, VMWare VMDKs do it, sparse KVM disk image files do it. Zeroed data is the default, the expectation for most platforms. Protected, virtual memory based operating systems will never serve your process data from other processes even if they wait until the last possible moment. AWS will never serve you other customer's data, Azure won't, and none of the major hypervisors will default to it. The exception to this is when a whole disk or logical device is assigned to a VM, in which case it's usually used verbatim.

Yeah, the thing you are missing there? VMWare, well... it's a very different market. Same with Hyper-V. And sparse files, well, as I explained, suck. (I suspect that to the extent that Hyper-V and VMware use sparse files, they also suck in terms of fragmentation, when you've got a bunch of VMs per box. But most of the time if you are running VMware, you've got money, and you are running few guests on expensive, fast hardware, so it doesn't matter so much.)

Most dedicated server companies have this problem. Most of the time, you will find something other than a test pattern on your disks, unless you are the first customer on the server.

No matter who your provider is, it's always good practice to zero your data behind you when you leave. Your provider should give you some sort of 'rescue image' - something you can boot off of that isn't your disk that can mount your disk. Boot into that and scramble your disk before you leave.

I know I had this problem, too.. many years ago when I switched from sparse files to LVM-backed storage. Fortunately for me, if I remember right, Nick caught it before the rest of the world did. I solved it by zeroing any new disk I give the customer. It takes longer, especially when I ionice the dd to the point where it doesn't kill new customers, but I am deathly afraid (as a provider should be) of someone writing an article like this about me. Ideally, I'd have a background process doing this at a low priority on all free space all the time, but making sure the new customer gets zeroes, I feel, is the most certain way to know that the new customer is getting nothing but zeroes.

>Zeroing an SSD is a costly behavior that, if not detected by the firmware, will harm the longevity of the SSD by dirtying its internal pages and its page cache. Not to mention the performance impact for any other VMs that could be resident on the same hardware as the host has to send 10s of gigabytes of zeroes to the physical device.

Clean failures of disks are not a problem. Unless you are using really shitty components (or buying from Dell) your warranty is gonna last way longer than you actually use something in production. Enterprise hard drives and SSDs both have 5 year warranties.

The dd kills disk performance for other guests on spinning disk if you don't limit it with ionice or the like, and that's the real cost. I would assume that cost would be much lower on a pure ssd setup.


FWIW, this kind of problem is minor in comparison to the potential exploits that have and will continue to crop up in shared computing environments.

https://www.cs.cornell.edu/courses/cs6460/2011sp/papers/clou...

https://www.informationweek.com/security/risk-management/new...?

http://www.insinuator.net/2013/05/analysis-of-hypervisor-bre...

http://slashdot.org/story/12/03/02/0059202/linode-exploit-ca...

http://thoughtsoncloud.com/index.php/2013/07/how-your-data-l...

I was trying to find any cases of a public cloud provider's customer data being leaked or easily visible on the internal customer network, but didn't come up with anything. Somebody's got to do a study on the major cloud providers and see if the good old methods to subvert network routes still works, or if you can easily mitm vm neighbors. (My guess is you can...)


Just a random idea from an ignorant individual..

What if DO actually encrypted the SSD space with a key that they only have, and a new key is created for each droplet?

Then any droplets that are created later in a deleted space will just see effectively random data, no?


You can put the key on the disk itself too--you just need to zero out the space containing the key when you're done. That's how phones with encryption and remote wipe features usually do it.

In fact, a lot of SSDs, e.g. Samsung's, already work this way, transparently AES-encrypting data before it is written. (With AES in hardware, the overhead is negligible, even for high-performance drives.) The "Encryption" feature they advertise actually just lets you set your own key for the encryption key container--it happens either way.


Intel FDE, as far as I know, also works this way.

The key is protected. In the event that the drive must be re-provisioned (like in the case of a lost password), the decryption key is simply overwritten by the new key, rendering the original data unreadable.


Maybe they "do stuff" with the data, like they've got an internal process to give historical access to the NSA or any random .gov upon request. No need to assume that massive pushback against obvious security improvements could only be ignorance or incompetence. Adding security could throw a wrench in the works if there is a business process or procedure that directly relies on insecurity. If they have a national security letter requiring them to do this, they won't be able to talk about the topic intelligently in public.

Or TLDR your idea would be great if they wanted a secure product, but they may legally be prevented from providing a secure product or even talking about the topic.


It's almost public knowledge as there's an option in the GUI called "scrub data", the absence of a tick there implies that it's not going to be erased before the VMs partition is reassigned. I had a chat to the support months back about "erase data" not being in the API at the time, and the solution I came to was just to ditch their API and go back to scraping their forms for the option.

That said, this would probably go down better for the company and the community if you tried a private disclosure rather than posting about it on Github.

https://www.digitalocean.com/security


I've been a DO customer for about a year and some, this is news to me. So no, I wouldn't say it's public knowledge.

That said, I haven't really destroyed many of my VMs.


http://i.imgur.com/Aliyewf.png

I suppose when I made that comment I was expecting everybody to be like me; eager to flip switches to see what they do.


I would expect that option to do something along the lines of scrubbing the data on the hardware level so that someone who physically examined the disk wouldn't be able to access my data. That's because I'd expect my data to be at least logically wiped in the first place.


Same here


That does not actually state anything that indicates they will just hand over all of your data to the next random stranger.


If the "securely scrub" parameter has ANY kind of performance impact which I suspect it does then isn't it a good thing that DO gave their users the choice ? There are plenty of use cases where I would trade privacy/security for performance.

Anyway disingenuous title to say the least.


It is good to give users the choice. It is bad to default to the dangerous choice. It is even worse to not describe the bad consequences of the default anywhere.

Nowhere in the DO UI for destroying a droplet does it indicate that they will leak all of your data to the next customer if you don't check the box.


This is bad for Digital Ocean as the checkbox is nothing but a legal excuse to let beginners shot themselves in the foot and for a new company, this isn't a situation you want to be into. Look at Amazon - how do they solve the same problem? Is it harder for them? Is there a checkbox? When you compete with Amazon you have to be that and better, not worse, and treat beginners better than experts!


Amazon doesn't do SSD VMs for $5 a month... If you are beginner you are using the Web GUI which clearly states you need to click the box to securely wipe all the data (which has a cost associated with it) if you are using the API then you should RTFM (Read the F Manual) if you build a service on top of the API then it is the service's fault for not including (which is what the gist is about)


The opposite of "securely wipe all the data" is not "give all of your data to the next customer".

The GUI does not indicate anywhere that data is leaked if you don't check the box.


From above: http://i.imgur.com/Aliyewf.png

What that poorly worded checkbox says to me: "tick this box if you want to prevent DigitalOcean from reading your stuff."

Nothing in that web GUI "clearly states" that not ticking the box will allow the next random user to read my files.


Well, I don't think they want to advertise themselves as "the subpar service that doesn't honor privacy and trade secrets for just $5 per month". Again, ToS and manuals are legal safeguards, not moral ones.


New response https://digitalocean.com/blog_posts/transparency-regarding-d...

I like DO as a service, but this is kind of strange. Humans act always the same. When catastrophe hits they want to sit it out, underestimating the impact


Please DO, do something brave and make the default behavior sane ( there is absolutely no way i would except another customer to get my data with a one line shell script, ever). I LOVE your service, so dont screw it PLEASE !!


DO released a blog post on the issue[0]

[0] https://www.digitalocean.com/blog_posts/transparency-regardi...


It would be interesting to see a survey of the various cloud VM services; if any of them will return non-zero (data) blocks for uninitialized storage.

Sounds like a major risk if SSH, SSL, passwords etc can leak this easily.


Also, even if they zero the primary block device, what about swap if it's provided as a separate block device?


DO provide a single block device (so swap is on the same block device), that gets zeroed if you set the scrub data option.


I just moved one of my apps from linode to DO several days ago, some explanations here just blowed my head off.


I have created a uservoice issue to scrub by default, please vote: http://digitalocean.uservoice.com/forums/136585-digitalocean...


Onapp has the same issue by default, it's not set by default to zero fill the old disks are being destroyed. This need to specifically be enabled in the config file by the administrators.

In case you are running a VM on top of their platform you may want to check to make sure this is enabled.


for those wondering, how that compares to AWS. AWS sec policies http://awsmedia.s3.amazonaws.com/pdf/AWS_Security_Whitepaper...


Hmm have you done this with private disclosure? If not, please do that next time. It is unethical to disclose security-related issue of some PaaS publicly without first going through the responsible party first. Even if a security issue you want to report is actually something only the owner can see... I don't know what this do because I don't really use DO, but just judging the way it is reported, it looks like you haven't...


Even if one bought the ridiculous notion that private disclosure of vulnerabilities is a good thing, why would you think it would be necessary to privately "disclose" to anyone what Digital Ocean already publicly documents in their API?


[edit]: okay I misunderstood the issue here.

Some day I would forget to close my home door with a bang. No home invasion yet because nobody notice that the door wasn't fully closed. Now I am telling people I will forget to close my door, and I will get an invasion.

Some people's repo have password committed. It only takes someone to google that to find out. If someone posts that on HN, yeah, it's on the web anyway, it only takes one person to make my password known instantly.


You are extremely confused. Digital Ocean does this deliberately and already publicly documents it. It is intentional.

This is not a bug report to Digital Ocean or any other PaaS. It is a request for a third-party library to support an option Digital Ocean's API provides.


[edit]: okay I misunderstood the issue here.


This is the equivalent of complaining very loudly and publicly that someone broke into your house after the police warned you, "Hey, you shouldn't leave your door standing open -- someone might break in."


>It is unethical to disclose security-related issue of some PaaS publicly without first going through the responsible party first.

Says who? You?

DO has a history of not responding to issues UNTIL they are publicly disclosed. And in any case, your iron-clad "argument" is a matter of opinion and nothing else. Many people prefer full disclosure.


> Says who? You?

> And in any case, your iron-clad "argument" is a matter of opinion and nothing else.

First, let me repeat: I did get the story mix up and the ethical approach I am referring to doesn't quite apply in the current story.

> DO has a history of not responding to issues UNTIL they are publicly disclosed.

Does not matter what happen between DO and whitehats. If an OS command injection is discovered, even if DO has a history of not responding to security issues, the moment the vulnerability is discovered, a whitehat should alert DO privately first. If they ignore it again, then of course you can let the public know and let your zero-day exploit begin. Regarding this, public disclosure before private disclosure is unethical.


>Regarding this, public disclosure before private disclosure is unethical.

I guess you consider Bruce Schneier a peddler of unethical behavior, then? He maintains the threat (and execution) of full disclosure is vital to maintaining security.


Bruce does NOT advocate an end run around the provider. You talk to the provider first, always.


Exactly.

Full disclosure does this. Before full disclosure was the norm, researchers would discover vulnerabilities in software and send details to the software companies -- who would ignore them, trusting in the security of secrecy. Some would go so far as to threaten the researchers with legal action if they disclosed the vulnerabilities. https://www.schneier.com/essay-146.html

If the code is public, just fixing the code without CVE or similar is considered bad because diffing the code will yield the vulnerability.

You don't go around and tell people you found a vulnerability until it is fixed (in the case of vendor ignoring alert it is ethical to tell the public).


http://www.wired.com/wiredenterprise/2013/04/digitalocean/

https://www.digitalocean.com/blog_posts/resolved-lvm-data-is...

This sort of terrible behavior was by design and known. They got written about and claimed to have fixed it, but didn't.


There is someone in this thread claiming to speak for DO, and he says that it's fully documented:

https://news.ycombinator.com/item?id=6983520

There's no point in private disclosure for something that the company themselves documents.

Of course, he also whines about responsible disclosure. The mutual exclusivity of these two things does not seem to have occurred to him....


Linkbait. Title should be "Digital Ocean API is not told to scrub (securely delete) VM on destroy"


"Don't scrub" does not imply "give the data to the next guy".

If I delete a file on my computer without doing a secure delete, I know that the data is still on the disk and can still be recovered. However, I also know that in normal operation of the computer, that data will never show up again. There are no circumstances where I can create a new file and have it get filled out with the contents of the deleted file. There are certainly no circumstances where another user on my computer can do that and get my data.

I would have every expectation that this scrub option works the same way. That it defends against specialized recovery efforts, not random people making new VMs. DO's documentation says nothing to indicate otherwise.


what i dont understand is, why would any one want to START with a VM that has some one elses data on it? Forget people wanting their data scrubbed on delete.


This makes me wonder if AWS secure on this front?


Customer instances have no access to raw disk devices, but instead are presented with virtualized disks. The AWS proprietary disk virtualization layer automatically resets every block of storage used by the customer, so that one customer’s data are never unintentionally exposed to another. AWS recommends customers further protect their data using appropriate means. One common solution is to run an encrypted file system on top of the virtualized disk device.

[1] http://awsmedia.s3.amazonaws.com/pdf/AWS_Security_Whitepaper...


have you gone to DO directly with this first?


According to them, it's either a bug they already fixed in April, or leaking data today by design?

https://www.digitalocean.com/blog_posts/resolved-lvm-data-is...

https://twitter.com/jedgar/status/417515181418479616


I don't really see how this is leaking data. I can use the dropbox API to make a bunch of files with private data shared with the world. But dropbox isn't "leaking" my data, I'm using the API in such a way that makes my data accessible. Not an exact analogy, and I would agree that the option should be on by default so people who know what they're doing can opt-out and everyone else gets a safer default, but this isn't a "data leak."


It absolutely is a data leak.

I spun up a vm and ran "strings" on the blockdev and got this:

http://i.imgur.com/fJOxRN9.png

Some poor iPhone users in Portugal have no idea that the app they're using is backed by a webservice on a VM that gives its block storage contents to anyone who gives Digital Ocean a $5 PayPal payment.

If that isn't a data leak, I don't know what is.


That's like dumpster diving only on planetary scale and no actual dumpsters involved. Wonders of technology and interface designers that don't realize people don't read docs and expect things to work properly by default.

But the dd thing is really embarrassing here, I mean I'd expect some data on shared hardware being recoverable using hardcore forensics, but there are enough levels between hardware and dd that using at least one of them to make old data inaccessible should be both possible and pretty cheap.


If you make an API call that asks for your data NOT to be scrubbed, then it's not a leak that your data isn't scrubbed--you asked for it. If you haven't read the docs, you might not be aware that you're asking for it. That's a Bad Thing. No question. It should be enabled by default, to prevent unknowing users from leaking their own data. If you ask for a scrub and you can still find data on the scrubbed block device, then you have a leak from DO.


I read the API documentation. It's pretty short. Here's the relevant bit:

"scrub_data Optional, Boolean, this will strictly write 0s to your prior partition to ensure that all data is completely erased."

If I didn't already know about this issue, I would never Have thought that leaving this option out would leak all of my data. My reading of the above option would be that, with it off, they would leave your data on the drive until it was reused, leaving open the possibility that e.g. the FBI could seize the equipment in the meantime and access it.

The opposite of "write zeroes to your partition" is not "give all of your data to the next customer".


I'd agree with you, except that in this case the API call is called "destroy". Were it called "deallocate", this would be a different story.


The bug in April was about the option not working.

The feature now works if you use it. By design if you don't use it you don't wipe the disk (saving you money).


There is a simple solution to this: don't trust providers to do what they say they'll do with your data. You should scrub any drive that's ever contained sensitive info before you throw it away, and terminating a VM instance is precisely equivalent to handing the VM's harddrive to your provider.

It's pretty easy nowadays to scrub a drive. Writing zeroes would suffice.

Personally, I'd worry more about what data is being leaked when your VM is paged to disk on your provider's servers. Parts of each of your VMs will probably reside in the pagefile at some point, so therefore writing zeroes won't save you if the provider has bad disposal practices (like not scrubbing before disposal). So it seems impossible not to have to trust a cloud computing provider whatsoever; some basic trust seems to be a requirement.

But that minimum level of trust should be the extent to which you trust them. Not scrubbing your drive before handing it over is placing faith where faith doesn't belong.


You may not always have the chance to scrub it yourself, for example when your VM has a hardware issue.


Is it the same issue, or a different one, from this one back in April of 2013?

http://www.wired.com/wiredenterprise/2013/04/digitalocean/?c...


lol of course.


these problems cannot be avoided


leaks leaks leaks. no good!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: