What is the difference exactly?

cookiecaper · on Feb 11, 2017

A snapshot could be a backup depending on what you're calling a snapshot, but yeah, in general, to be a backup things need to have these features:

1. stored on separate infrastructure so that obliteration of the primary infrastructure (AWS account locked out for non-payment, password gets stolen and everything gets deleted, datacenter gets eaten by a sinkhole, etc.) doesn't destroy the data.

2. offline, read-only. This is where most people get confused.

Backups are unequivocally NOT a live mirror like RAID 1, slightly-delayed replication setup like most databases provide, or a double-write system. These aren't backups because they make it impossible to recover from human errors, which include obvious things like dropping the wrong table, but also less obvious things, like a subtle bug that corrupts/damages some records and may take days or weeks to notice. Your standbys/mirrors are going to copy both of obvious and non-obvious things before you have a chance to stop them.

This is one of the most important things to remember. Redundancy is not backup. Redundancy is redundancy and it primarily protects against hardware and network failures. It's not a backup because it doesn't protect against human or software error.

3. regularly verified by real-world restoration cases; backups can't be trusted until they're confirmed, at least on a recurring, periodic basis. Automated alarms and monitoring should be used to validate that the backup file is present and that it is within a reasonable size variance between human-supervised verifications. Automatic logical checksums like those suggested by some other users in this thread (e.g., run pg_restore on a pg_dump to make sure that the file can be read through) are great too and should be used whenever available.

4. complete, consistent, and self-contained archive up to the timestamp of the backup. Differenced backups count as long as the full chain needed for a restoration is present.

This excludes COW filesystem snapshots, etc., because they're generally dependent on many internal objects dispersed throughout the filesystem; if your FS gets corrupted, it's very likely that some of the data referenced by your snapshots will be corrupted too (snapshots are only possible because COW semantics mean that the data does not have to be copied, just flagged as in use in multiple locations). If you can export the COW FS snapshot as a whole, self-contained unit that can live separately and produce a full and valid restoration of the filesystem, then that exported thing may be a backup, but the internal filesystem-local snapshot isn't (see also point 1).

fh973 · on Feb 11, 2017

Backups protect against bugs and operator errors and belong on a separate storage stack to avoid all correlation, ideally on a separate system (software bugs) with different hardware (firmware and hardware bugs), in a different location.

chousuke · on Feb 11, 2017

The purpose of a backup is to avoid data loss in scenarios included in your risk analysis. For example, your storage system could corrupt data, or an engineer could forget a WHERE clause in a delete, or a large falling object hits your data center.

Snapshots will help you against human error, so they are one kind of backup (and often very useful), but if you do not at least replicate those snapshots somewhere else, you are still vulnerable to data corruption bugs or hardware failures in the original system. Design your backup strategy to meet your requirements for risk mitigation.

gregmac · on Feb 11, 2017

I'd also add not just different location but different account.

If your cloud account, datacenter/Colo or office is terminated, hacked, burned down, or swallowed by a sink hole.. You don't want your backups going with it.

Cloud especially: even if you're on aws and have your backups on Glacier+s3 with replication to 7 datacenters in 3 continents... If your account goes away, so do your backups (or at least your access to them).