Hacker News new | past | comments | ask | show | jobs | submit login

If you're doing backups for your business, I've written on how to properly encrypt backups[1] and how to use Google Compute Engine for backups[2]. I'm working on write-ups for AWS and Azure that should post within the new few weeks.

[1] https://summitroute.com/blog/2016/12/25/creating_disaster_re...

[2] https://summitroute.com/blog/2016/12/25/using_google_for_bac...




This is certainly one way to do backups. Two things which come to mind on first reading:

1. You're encrypting backups but not authenticating them; someone with access to your archives could trivially truncate an archive or replace one archive with another, and there's a nontrivial chance that they could splice parts of different archives together.

2. Every archive you're creating is stored as a separate full archive; this will probably limit you to keeping a handful of archives at once. With a more sophisticated archival tool, you could store tens of thousands of backups in the same amount of space.


These are both accurate.

For 1, I ensure that an attacker can not modify my archives after they've been uploaded by giving the backup service "put" only privileges. This is not possible with GCE from the article unfortunately, as I point out in a warning banner there, but is with AWS that I'll post soon. My use case is primarily to have a backup in the event of a devops mistake, or malicious attacker (ransomware), so I assume if someone has write-access to my archives they would just delete them, so authenticating them isn't as big of a concern, but although this would be a good idea just to ensure the files aren't corrupt in some other way.

For 2, my storage needs currently aren't expensive (100GB archives per day, which means pennies per day for all of them), but eventually I plan on sending just diffs. I also wanted to create and send backups in the simplest possible way to help people get up and running as fast as possible, which meant limiting myself to the "openssl" command and other basic commands. The other, smarter, solutions I'm aware of are either tied to a service (ex. tarsnap) or don't maintain the data as encrypted at the backup location.


Not bad! Can I recommend that you try out the per-object storage classes and lifecycle policies? Particularly if folks are going to be effectively rsyncing things, it's really handy to minimize the combination of retrieval fees and storage fees (this really depends on the manner of backup, incremental versus full, etc.).

Also not mentioned is that each service also supports versioning, which for backups that don't do block-based backup, can be an alternative DR plan (e.g., don't allow some users to delete the last version).

All in all, a good start (complete with helpful screenshots!). Looking forward to the guides on S3 and Azure Blobs.

Disclosure: I work on Google Cloud.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: