Hacker News new | past | comments | ask | show | jobs | submit login

I think the best approach is never save unencrypted data on cloud. Always encrypted on client first. But by that way we lost dedup capability, so we have to do everything, such as encryption, dedup and compression on client side. I made an in-app file system dedicated for that purpose. https://github.com/zboxfs/zbox



> But by that way we lost dedup capability

This depends on how secret do you want your data to be. You could use block-based encryption/compression and backup. That way you can still dedup encrypted result.

If anyone can inject data into your system and monitor the backup, they could learn when they hit collisions, but for most personal backup cases that's irrelevant.


I don't think the encrypt-then-dedup is a safe way to protect data privacy. In this case, identical blocks need to produce same cipher text, this will actually leak your data pattern even though it is encrypted. A better way I think is using randomly-seeded derived keys to encrypt each block, thus the identical blocks' cipher text will always be different.


Yes, if that's more important to you than dedup savings, then you should definitely do that.


if you encrypt in a way that enables the service to do dedupe, you are either reusing IVs and encryption keys across items (bad) and leaking information that two items are the same item.


You must not reuse IV between different blocks, but that does not stop you from using the same IV for the same block. Yes, you leak information about matching blocks - it's up to your use case whether you care about it.


Yes, sorry I was imprecise. I meant reusing IVs for equal blocks only, to be able to see duplicate data.


Dedup and encryption are 90° orthogonal. Encrypted data should look like uniform noise from every conceivable direction. Just the fact that blocks persist between encryption runs is leaking sigint.

I think a better approach, if you want to have versionable files, but encrypted outside of the client, would be to do something with diffs, similar to Git, or perhaps staged dockerfile builds, depending on whether it is binary or text data.


Turning two blocks of cleartext with the same content into equal blocks of cyphertext is not great for encryption: https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation...


I think this is a really compelling approach!

What sorts of applications have started to adopt this?

I also thought a different layer to start at would be sqlite databases since I understand that many mobile application use that.

How do applications handle conflicts? It looks like there is a version on files but when is a new version created? On close?

Do you see GDPR or any other compelling event that will cause applications to consider this sort of cloud storage?


I think any applications need store confidential files on client or remote can adopt it. Web and mobile app might be the best to use it at this moment.

As there is transaction control, the conflict handling should be straightforward. That is, the thread got write lock can write the file exclusively and each write is a transaction and commit will form a new permanent version.

GDPR might be a good reason, but I think it can be more general. Any apps need store confidential data can use this, no matter the data is on local or cloud.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: