This is my biggest concern with the Dropbox platform summed up in the filing: \*...

varenc · on Feb 24, 2018

Dropbox, GDrive, Box, OneDrive, etc. aren't using zero-knowledge encryption, so it must be technically feasible for them to access user data. If a company has a forgotten password reset feature, it's a good sign that it's possible for them to access your data.

You have to trust the company providing the service, right? Of course in practice, accessing user data should be tightly controlled and require good business reasons and levels of approval.

Zero-knowledge alternatives like Spideroak exist, but this approach makes them sacrifice features. (and doesn't appear very popular based on market share)

mikepurvis · on Feb 24, 2018

It's not as black and white as all that. In the early days, it was probably a trivial operation for any employee to examine any customer's dropbox content. Hopefully these days the prod machines and backups.are walled from all but a rotating team of SREs, whose own actions on them are subject to audit.

gwbas1c · on Feb 24, 2018

Early at my days at Syncplicity, (a Dropbox competitor,) I specked out what was needed for true client-side encryption with no ability to decrypt on the server. It's very easy to do, from a technical standpoint.

(We solve the problem by letting our large customers run their own servers, with their own authentication via single sign on.)

The problem is that the user experience for client-side encryption is awful! Every shared folder will need its own key, and users would need to manage and share their keys outside of our system. That is not sustainable.

But then the major feature set breaks down. Want to access your files in a browser? Not with client-side encryption. Want to email someone a hyperlink to a file? Not with client side encryption.

The major lesson is that the world operates on trust. We can only stay in business if our customers trust us.

dsacco · on Feb 24, 2018

> The problem is that the user experience for client-side encryption is awful! Every shared folder will need its own key, and users would need to manage and share their keys outside of our system. That is not sustainable.

But then the major feature set breaks down. Want to access your files in a browser? Not with client-side encryption. Want to email someone a hyperlink to a file? Not with client side encryption.

I do all of this with Boxcryptor. I might be misunderstanding you - do you mean decrypt it without first downloading it from the browser? Because yes, that’s not strictly possible.

But Boxcryptor implements a small wrapper around directories and generates a public/private key pair tied to email addresses. You can client-side encrypt a file with your - or anyone else’s - public key by moving the file into the directory. You can also change the file’s encryption to add or revoke access by multiple users.

If you wrap Boxcryptor around your local Google Drive, Dropbox, Box, etc. directory, it automatically client-side encrypts, then uploads new files. Then you can share a hyperlink to share encrypted files without exchanging keys with anyone. The usability is so great I’ve been able to use this with non-technical clients. You can even use your own key pairs.

diggs · on Feb 24, 2018

There's also no deduplication with client-side encryption.

B-Con · on Feb 24, 2018

In the traditional model, yes. But if you're willing to make sacrifices... see previous discussion on this topic when it came up with Mega, de-duplication, and client-side encryption: https://news.ycombinator.com/item?id=5084261

irq · on Feb 24, 2018

Doesn’t homomorphic encryption allow for this?

http://ieeexplore.ieee.org/document/7255226/

candiodari · on Feb 24, 2018

No. It doesn't.

irq · on Feb 24, 2018

Can you explain?

candiodari · on Feb 24, 2018

The whole point of encryption is that you cannot meaningfully compare 2 pieces of plaintext.

Homomorphic encryption doesn't change that.

The only way to compare plaintext is to decrypt the whole thing. So either you must trust a centralized org (like dropbox today), or you must trust a single centralized key (that could be done with homomorphic encryption).

(Also the best homomorphic algorithms still make small programs take days to execute)

XMPPwocky · on Feb 25, 2018

Consider a scheme in which:

Each user generates a symmetric "user key", kU.

The plaintext of each file (or without loss of generality, block of data, etc.), pFile, is encrypted with a randomly generated symmetric key, kFile, producing the ciphertext cFile. pFile is also hashed with a cryptographically strong hash, producing hpFile. kFile is then encrypted with hFile, producing ckFile. The user encrypts pFile with kU, producing chpFile. Finally, the user takes the first N bits of hpFile (for N on the order of, say, 16 or 32), producing hpFileTrunc. The user then submits hpFileTrunc to the server.

The server is, semantically, just a list of 3-tuples: (cFile, ckFile, hpFileTrunc).

The server sees if it knows of the existence of records with the same hpFileTrunc value as the client's submission. If so, it returns them to the client.

The client then tries, for each record returned by the server, decrypting ckFile2 with the client's hFile value, potentially producing kFile. If this is successful, the client then decrypts cFile with kFile, producing pFile. Finally, it compares this pFile to the original. If it matches, a match has been found, and the client exits the loop. If not, (or if either of the two decryption steps failed), it continues to the next record the server returned. If there are no more records, the client instead submits the tuple (cFile, ckFile, hpFileTrunc) to the server, which stores it.

Finally (whether or not a match was found), the client stores chpFile locally, to be used when retrieving the file.

To retrieve the file, the user decrypts chpFile with kU, producing hpFile. They truncate hpFile, producing hpFileTrunc, and submit it to the server. They perform the same process described earlier to retrieve the matching pFile.

(Note: truncation may also be replaced by, or combined with, a second round of hashing.)

With this scheme, assuming secure primitives (authenticated encryption and hashing), I don't believe it's possible to learn any information about a file unless you already have its contents.

So the server can tell if you're accessing (storing or retrieving) a particular file if and only if the server knows what it's looking for.

TL;DR: you can totally construct a scheme that allows meaningful comparison of plaintexts!

But... this is probably a bad thing. Comparison of plaintexts is a vulnerability: the server being able to see who's storing a particular "bad" file has a real impact on privacy. And likely more subtle impacts, too...

candiodari · on Feb 25, 2018

The whole point is to allow for comparison of large plaintext files that are stored by many users. Think of mp3s, or large avi files, or, say, a linux kernel image, or ...

> The server sees if it knows of the existence of records with the same hpFileTrunc value as the client's submission. If so, it returns them to the client.

And by doing this, provides a way for clients to verify if any user on the file storage server has this file. So if I wanted to know if your mozilla thunderbird has a mail I have the source to, I simply try to store this and get these duplicate records.

Most people would consider this extremely unacceptable.

> The client then tries, for each record returned by the server, decrypting ckFile2 with the client's hFile value, potentially producing kFile. If this is successful, the client then decrypts cFile with kFile, producing pFile. Finally, it compares this pFile to the original. If it matches, a match has been found, and the client exits the loop. If not, (or if either of the two decryption steps failed), it continues to the next record the server returned. If there are no more records, the client instead submits the tuple (cFile, ckFile, hpFileTrunc) to the server, which stores it.

Why would the client have the keys to files stored by other users ?

Unless you mean that you can only deduplicate within a single client, in which case that's of much more limited use (and I might add, your encryption scheme is way more complex than it needs to be).

XMPPwocky · on Feb 25, 2018

> And by doing this, provides a way for clients to verify if any user on the file storage server has this file. So if I wanted to know if your mozilla thunderbird has a mail I have the source to, I simply try to store this and get these duplicate records.

Yes. This is the reason you don't want this property (being able to deduplicate encrypted files)!

But you can provide it, while still providing meaningful security against other attacks.

The client has the keys to files stored by other users because the keys are the hashes of the plaintext, and the client can hash its own plaintext when it has the file.

(Note a trivial modification to this scheme, solely client-side, allows for certain files to be totally secure, with the cost of them being exempt from deduplication)

candiodari · on Feb 25, 2018

> The client has the keys to files stored by other users because the keys are the hashes of the plaintext

Personally I find only people explicitly authorized have the key to be the whole point of security. And you're suggesting this as a solution to the problem that organizations providing file storage could see what files you're storing.

Under this scheme, it wouldn't just be that organization, but everybody who is a client, that could see what files you're storing (or at least verify if you're storing a particular file or not)

So I find your assessment:

> But you can provide it, while still providing meaningful security against other attacks.

Very dubious indeed, especially given the context of securing centralized file storage, where the whole point would be to deny others access.

I mean it's a true statement, because you don't specify what "other attacks" are.

I posit that given that this system leaks the plaintext of your files I find it strictly worse than just giving Dropbox or Microsoft access to my files.

XMPPwocky · on Feb 25, 2018

> Under this scheme, it wouldn't just be that organization, but everybody who is a client, that could see what files you're storing (or at least verify if you're storing a particular file or not)

You can do this today, with Dropbox or whatever else- anything that does deduplication, if it saves bandwidth by not asking for files it already has.

You can't tell who is storing a particular file- only if anybody is. Does this leak information and impact privacy? Yes! But it still provides other useful properties.

If you have a copy of a file, you can see if anybody else does- a boolean value. (And if the server is malicious, it can tell who does (if it logs).) If you don't have a copy of a file, you can learn absolutely nothing about it.

So, for example, if a user uploads a, uh, personal image to the service- with Dropbox, in theory (they likely have strong organizational and technical controls against this sort of thing, mind you) if the server is malicious they can view that image.

With this scheme, the server can't.

On the other hand, if you, say, save a file containing only your social security number- or a similar low-entropy value- the server can crack the hash and decrypt that file. That's the price you pay for being able to deduplicate.

(Perhaps one could only deduplicate large files- thus handling the case of movies, music, Ubuntu ISOs, large system files, etc. To implement selective deduplication- if you want a file to not be deduped, replace all uses of its hash with, instead, a unique random value to identify the file. Server requires no modification.)

nugget · on Feb 24, 2018

For a single user with no need to share access, you can also just stick everything in a VeraCrypt vault stored on Dropbox. Dropbox seems pretty good at updating only the parts of the vault that have changed, versus the entire (sometimes huge) vault file. I've heard OneDrive updates the entire vault file every time, although I haven't experimented with it myself.

varenc · on Feb 24, 2018

Dropbox's client uploads binary diffs to make updates to things like your VeraCrypt value efficient. Using a modified librsync last I heard. Sounds like OneDrive is missing this feature.

https://www.dropbox.com/help/syncing-uploads/upload-entire-f...

https://github.com/dropbox/librsync

andrenotgiant · on Feb 24, 2018

The point of that section of an S-1 is to try and disclose every possible risk so that investors can't accuse you of hiding risks later on.

So, when quoted out of context it sounds really extreme.

nine_k · on Feb 24, 2018

I wonder how spideroak.com compares.