Hacker News new | past | comments | ask | show | jobs | submit login
Dropbox confirms that a bug within Selective Sync may have caused data loss (githubusercontent.com)
128 points by ghuntley on Oct 11, 2014 | hide | past | favorite | 60 comments



Additional info from Dropbox support:

    We received several reports from users who used a Dropbox feature called Selective Sync and couldn’t locate certain files they’d saved in Dropbox. 
    When we took a closer look, we discovered that older versions of the Dropbox client had introduced an issue affecting a small number of users whose Dropbox application shut down or restarted while users were applying Selective Sync settings.

    In light of all of this, we've taken the following steps to ensure the Selective Sync bug won’t affect anyone else going forward:

    1) we've patched our desktop client so this issue doesn't exist in Dropbox anymore;
    2) we've made sure all our users are running an updated version of the Dropbox client; and
    3) we've retired all affected versions of the Dropbox client so no one can use them.

    We've also put additional testing in place to prevent this from happening in the future.

    We’re very sorry about this issue and the trouble it might have caused. We’ll keep doing our best to ensure our users' data is always safe and available to them.


Just so you folks don't have to scroll sideways:

We received several reports from users who used a Dropbox feature called Selective Sync and couldn’t locate certain files they’d saved in Dropbox.

When we took a closer look, we discovered that older versions of the Dropbox client had introduced an issue affecting a small number of users whose Dropbox application shut down or restarted while users were applying Selective Sync settings.

In light of all of this, we've taken the following steps to ensure the Selective Sync bug won’t affect anyone else going forward:

1) we've patched our desktop client so this issue doesn't exist in Dropbox anymore; 2) we've made sure all our users are running an updated version of the Dropbox client; and 3) we've retired all affected versions of the Dropbox client so no one can use them.

We've also put additional testing in place to prevent this from happening in the future.

We’re very sorry about this issue and the trouble it might have caused. We’ll keep doing our best to ensure our users' data is always safe and available to them.


Is there a way we can contribute some CSS fixes to the HN code base. These issues could be quick and permanent fixes. Also 12px Verdana? Mobile?


They've been reluctant to change the markup because of the many scrapers. Hence, the recently released API in preparation. An updated UI is incoming.


The majority get a poorer experience because a few people are scraping? I'm guessing 99% of traffic is from people hitting the site in browsers.


HN has historically subscribed to a form of deontological ethics[1] rather than utilitarianism.

[1] http://www.bbc.co.uk/ethics/introduction/duty_1.shtml

(I happen to agree with HN’s ethical posisition, but I invite you to look at the section “Bad points of duty-based ethics” in the link).


You are assuming the rule "Never change the source code unless you have to" is a deontological imperative.

It's not really is it? It's just something you believe and are justifying with obscure ethical arguments. Thanks for the reading :-D


Of course I see your comment after I finished reading the parent on my tablet. Thanks though!


I was affected by this, but I realized it at the time.

I have an older laptop that I turned on. It was a work laptop a few years ago, linked to my dropbox account, etc. Since then I had added a bunch of things like a bunch of git repos to a folder included in dropbox.

I turned on that laptop and Dropbox started using 100% cpu after a few minutes. Then the fan kicked on and it was annoyingly loud so I looked at dropbox and saw it was chugging along in the repos directory. I went ahead and clicked on selective sync, unchecked repos, and left it alone for about 5 minutes.

It was still 100% cpu, so I killed the dropbox task and restarted it.

Minutes later, on another machine, I went to fetch from one of the repos and it had a gnarly error. So I went about investigating.

I found my way to the dropbox events tab (on the website - the desktop client doesn't have this feature) and saw an event where dropbox decided to delete 7,800 files.

I submitted a support request, but before they responded I had figured out it was (mostly) in the repos directory, which I fixed by simply deleting the repos and pulling from one of my servers.

Anyways. There's my real world run in with this bug.


This is exactly why sync is not a commodity. Dropbox is the very best at what they do, and even they have bugs. So when someone offers to sync your files for less, ask why.


The sync heavy lifting in Dropbox is handled by librsync, or at least was at one point. librsync is very mature open source software, and this bug pertains to a particular interaction between the Dropbox GUI and a feature (selective sync) which they have somewhat tacked on to the core library. Long story short, you don't necessarily have to be Dropbox or employ a couple hundred software engineers to get inotify + rsync working well.


> The sync heavy lifting in Dropbox is handled by librsync, or at least was at one point.

Nope, highly custom process that involves librsync very little. The sophistication of what has to be done to solve this problem well would probably surprise you.


Isn't the desktop sync client built around the csync tool? http://www.csync.org


Yeah, rsync != bidirectional sync.


He wasn't talking about rsync the binary but rather about librsync [0], the library that allows you to calculate a diff between two byte arrays without needing the cooperation of the other side -- ie you can calculate an efficient delta to send to remote without needing remote to be online at the moment you calculate, and without needing the complete old version, only a signature of it.

A good example use of it is the rdiff tool that allows you to do exactly what I said before. A better real-world example would be duplicity [1] or rdiff-backup [2] that use librsync to de-duplicate backups, without needing access to the whole previous value, only a small signature of it.

[0] https://github.com/librsync/librsync

[1] http://www.nongnu.org/duplicity/

[2] http://www.nongnu.org/rdiff-backup/


> This is exactly why sync is not a commodity. Dropbox is the very best at what they do,

The very best? I use OneDrive across all of my Windows machines and I don't even notice it exists; never had any problems. I just access all my files everywhere. If you buy a windows phone you even get a decent amount of space for free (15GB). (Though I subscribe to Office 365 so I have virtually unlimited space.)


It is a cliche, but the plural of anecdote is not data. I never had any problems with Dropbox, but had Office on OneDrive corrupt files. OneNote has also rendered some notes unreadable.

The bottom line is that errors happen. You should prepare for that and make backups.

Also, Dropbox are still among the very best when it comes to syncing. Many useful synchronization features are implemented by Dropbox, but not the competition. E.g., features that most competitors do not have:

- Modifying a large file on Dropbox will only resync modified chunks.

- DropBox avoids re-uploads, both when uploading identical files and moving files around:

http://macography.net/2013/05/speed-test-dropbox-google-driv...

- Dropbox does LAN sync. If a machine has to download a large file and another machine on the network has the same file, chunks are provided peer to peer. This makes using large files on multiple machines or in a team much faster.

- Dropbox does streaming sync. A machine can already download chunks when another machine is still uploading:

https://blog.dropbox.com/2014/07/introducing-streaming-sync-...

Sure, OneDrive and Google Drive do have many useful functions that Dropbox does not have, such as including complete office suites. But for the original task, file syncing, Dropbox is still pretty much unbeaten.


I think you're attributing too much of Dropbox's success to simple technical reliability. It really isn't that difficult a problem, and many services and projects do it right. I have an rsync script that has been syncing my files reliably to an offsite location for 6 years.

It's certain that Dropbox has a high quality syncing service, but there are other factors. Think, for example, how this case was handled: a fault in their core product, a breach of user trust in their service, and they understood that it needed more than a technical solution. None of this was part of their core sync reliability: it was part of a more broad quality, which is closer to their true reason for success.


I think you're attributing too much of Dropbox's success to simple technical reliability.

I did not say anything about their reasons for success. Only what the technical advantages are compared to some of the other file sync services.

It really isn't that difficult a problem,

Difficult enough that some of its useful features are not matched by other services yet.

I have an rsync script that has been syncing my files reliably to an offsite location for 6 years.

That's great. But that is one-way sync and not something my parents could use. Dropbox is successful because they made sync technology that is relatively flawless to the average user. Also, there is a network effect.

In the longer term, it will be interesting to see if they survive, since Microsoft and Google have been undercutting prices heavily, and as far as I know there is no online Office suite on the horizon (only Microsoft Office integration for business users).


> Dropbox does LAN sync. If a machine has to download a large file and another machine on the network has the same file, chunks are provided peer to peer. This makes using large files on multiple machines or in a team much faster.

Doesn't the file have to exist on Dropbox's servers before it can be synced to another computer on the same LAN? The last time I looked into it, this was the case.


We use OneDrive for business at work (I'm not sure it's exactly the same as OneDrive under the hood though) and I can't say I had the same experience.

- There is no Linux/OSX client unlike DropBox (the OSX client only works for OneDrive and not OneDrive for business) so it's unusable with servers or environments with a lot of OSX machines. (so most enterprise usage)

- There is a list of approved file types and if your file is not on the list, it just refuses to sync it. This is really annoying because I need to create zip files all the time to bypass this bug.

- OneDrive modifies certain file types (like word document but also others) to add metadata on it so you never know if the file you are getting is exactly the same as the one you synchronized.

- We experienced bugs in the permission system which destroyed couple of files (thankfully we had backups).

The only positive thing with OneDrive is that it's integrated with Office 365 (it's the equivalent to Google Drive for Gmail) so you can preview files directly within your web browser on Office 365 (when it works because sometimes you just can't).

I would have a choice, I would return to Dropbox without any hesitation.


It sounds like you're using OneDrive for Business (aka SkyDrive Pro), which is something like a hosted SharePoint solution. Regular OneDrive is comparable to Dropbox, will sync any files you want, doesn't add metadata to Office documents, etc. The naming is incredibly confusing since OneDrive for Business is just about nothing like OneDrive.


Perhaps they should have called the product family "TwoDrive".


OneDrive for Business is SharePoint. I use it for my files w/ O365 and the domain to view your files is literally *.SharePoint.com


I haven't used onedrive in a while, but when I did it would take a lot of resources and syncing tiny changes tended to take over 15 minutes.


I thought there were many robust open source solutions out there and what made dropbox the winner was not that it is able to reliably sync, but that they made it easy to install and sync.


Dropbox is generally known for reliability in the face of competitors, however issues like this one and the highly embarrassing "anyone can login to anyone else's account" bug of 2011 definitely cast some doubt on using it as a sole storage option, or a solution to store any sensitive data.

Most medium-sized and large companies refuse to touch Dropbox due to these reasons, especially in the financial and medical space.

That said, I still employ it for personal use and like the product in general.


I aggressively use selective sync, and have since as long as I can remember yet I haven't got an email like this, so it may only affect specific users.


It appears the circumstance is more specific than simply using selective sync.

> This problem occurred when the Dropbox desktop application shut down or restarted while users were applying Selective Sync settings.

So, you must be in the midst of applying selective sync settings while the app shuts down or restarts. Although I'm not sure what they mean when they say, "while users were applying selective sync settings." I'm not sure if this means:

A) Changes made in the selection dialog box, but not committed (by clicking OK).

or

B) Changes committed, but still syncing.

The former is an edge case, the later, not so much.


Interesting. I know a few times I've had to kill the dropbox process while changing my selective sync settings before.


[deleted]



Thanks. I thought I remembered reading about this same issue some time ago. Seems like a rather slow response from Dropbox, doesn't it?


Dropbox should have understood that people are using it as a backup service. I mean carousel and other use cases sort of ebcourage and imply this. With that in mind, it baffling they didn't have any proper backups for user data.


It is a shame that they don't offer the unlimited packrat option anymore. It's still not backup, but at the very least people would be able to recover files in such cases.

Also, if I understand correctly, Google Drive has a better policy here: removed files are just placed in the trash until you remove them from the trash. Of course, trash takes space up as well, but it protects better against such cases.

I guess Dropbox is trying to maximize its profits with its 'remove after 30 days' policy.


I have been using the "packrat" feature for more than a year, and Dropbox sent me a similar notification today to tell me they lost several thousand files, 816 could not been restored. They were "lost" around 8 months ago, so "packrat" didn't save me at all.

As it turns out, I have other backups of most of the files, and the rest of them weren't important. So I was lucky. Still, my confidence in the product is unlikely to recover.

I want to note that I had been aware of the "dropbox is not backup" chorus, but that argument usually is just "sync is not backup", which is sort of obvious. The packrat feature pretty much addressed this issue, so dropbox with packrat WAS a backup solution. So the lesson here is never to rely on any ONE backup provider.


I use packrat - so the files they suggest that I might have had deleted were recoverable (though none of them were an error).

If dropbox kept backups like you're suggesting, people would be complaining about how you can't delete damning files from them and that law enforcement was abusing this.


A good reminder that Dropbox is not a backup client, and should not be relied on for backups, any more than RAID should be.


I disagree. A local tape NAS is also not a good backup, (god forbid) your house can burn down. A remote service like Backblaze is also not a good backup, they could go bankrupt or a software error could corrupt all your backups.

A good backup policy uses a mixture of onsite and offsite, and Dropbox can be a (convenient) part of that. E.g., I store (non-sensitive) files in Dropbox, which gives me a certain period of undelete possibilities. My Dropbox folder is backed up on a local time machine backup. Critical parts of my Dropbox are also backed up using tarsnap, etc.

A good backup policy diversifies, and Dropbox can be part of that.


It's about as good as any online backup system. It's much much better than raid.

If you want to be picky you shouldn't rely on backups unless you have multiple independent backup systems, at least one offsite, at least one offline.


Sorry, but this is just wrong. Any system that offers multi user sync is inherently more complex than it needs to be as a backup solution. A backup should generally be

- convenient enough that you do it without thinking about it.

- technically as simple as possible, so it's easy to understand and review.

- secure.

Dropbox fullfills the first point, but not the second, and the third is debatable. Spideroak as a counterexample is just as convenient, has a pure incremental backup mode and is client-side encrypted, the gold standard of security.


You really don't need any of those to be a solid backup system. What matters is that you make backups, the backups last long enough, and there's testing of backups.

Also from what I've seen spideroak is significantly more complex than dropbox.


It can be a backup client, as long it's not the only one. I use OneDrive + a local NAS as my backup. Both have advantages and disadvantages, but having at least one copy locally and one copy in the cloud is quite safe. Obviously these two must not be synced in realtime (because if the sync software screws something up, then you might lose everything) -- so I just keep everything on OneDrive, and every couple of weeks manually copy the files (this is still not 100% safe because of possible file corruption issues, but it's OK for me).


Exactly what I came here to say. Syncing services can be great, but should not be a replacement for proper backups.


We provide a self hosted sync offering for businesses. It is currently used by close to 1000 businesses. It took us almost 18 months from our launch to get the sync right. There are simply too many edge cases and the development team needs to closely work with the customers to identify and fix it. Even then our complexity is much less than dropbox. The largest customer of ours have 10000 users.

Short story: if you plan to develop a sync product from scratch, be prepared to spend at least 2 years or hire core developers from Dropbox sync team. Eve now dropbox has issues with handling large number of small files. Try to stuff 200000 to 300000 files and see how it works.


How old is this issue? The release notes don't spell out clearly if this bug was fixed in the past 2-3 updates (using v2.10.30 on Win 7)

https://www.dropbox.com/release_notes


I was notified of the potential data loss and checked my data on the 'personalized web page.' Of the 12,000 files that may have been affected, I found only a subfolder of a few dozen photos that may've been removed.

The problem is that when I clicked 'restore all' from within the subfolder, Dropbox restored all 12,000 files rather than just the files within the folder.

Note to DB's UX team: when you place a Restore All checkbox above the lefthand file selection column, it means 'select and restore all files on the page', not 'lift the roof off my house and dump in all the shit I spent months decluttering.'


I've been hoping for a 'restore folder to date X' for a long time too.


Ha, dropbox deleted my files the other day presumably due to this bug. I ranted on Twitter and they came back with the dropbox client can't delete files. Hmmmm. Seems I was correct :-/


I have all my digital life on dropbox, a few hundred gig.

One of my greatest fears is that thousands of files might disappear without me noticing for years.

I use selective sync and twice I was looking for something that has disappeared and I have to restore it. I assumed maybe my wife accidentally deleted some files, but maybe it was dropbox?

What is the solution to this anxiety?


Perhaps you can have a cron job run against your local dropbox folder(s) and do an "md5deep" reporting only differences (or sha1deep or whatever, perhaps test which uses least resources, maybe nice it heavily too). Then you could have the output report saved to a folder (not a dropbox one!). Perhaps add another job to email/alert you if the "count" of lines in the report is greater than a certain number? Crude, for sure.


Don't put all your eggs in one basket. Keep all the same data stored on another service, as well as on a physical backup that you keep yourself. If it's really important data you might keep a copy in a safe deposit box.


I remember someone posting on HN about this, a month ago. Can't seem to find the link.



This manifested for me as a large number of "conflicted copies" everywhere inside my main visual studio solution. Thankfully, source control saved the day... But I was really annoyed at Dropbox for a little while.


As founder of cloudHQ, I have to jump into this. Software products will have bugs. And people will make mistakes. We are all human.

So even if you store data in Dropbox - it is smart to have one extra copy in some other cloud storage. Like Google Drive. Or Box. Or Egnyte. So if data is deleted in Dropbox (accidentally, maliciously, or due to a bug) you can restore it from other cloud.

Of course, cloudHQ is the system which can do that: http://chq.io/hnsc


I stopped using Dropbox because of this. I booted my system up one day and a ton of my files were deleted (locally). Luckily, this didn't affect the sync on my other systems.


Yikes. All those nines of durability that Amazon provided for Dropbox...brought to naught by a bug in Dropbox's software.


I think this calls for an aggressively distributed, user-controlled backup system. Perhaps tahoe lafs based.


[deleted]


There have been lots of mail loss and other bugs in Gmail over the years. Here's an example:

http://www.digitaltrends.com/computing/google-confirms-missi...

And privacy/government implications wise, Google is hardly any better than Rice...


There may occasional bugs, but I don't know of any cases when Gmail actually lost emails. According to this talk [0], the bug referred to by that article caused some users' emails to be temporarily inaccessible, but all the emails were eventually recovered.

[0]: https://www.youtube.com/watch?v=eNliOm9NtCM#t=1734




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: